Search This Blog

Thursday, 25 December 2014

Installing multiRack hadoop cluster


In previous Blog we understood how to use LXC to simulate multiple nodes .This Post helps you to configure multitrack cluster.Lets start directly on it .


Step 1: Create your first Ubuntu container called webageTest1 – it will take a few minutes as some O/S packages will be downloaded. But they will be cached, so the next containers will be created in a few seconds.

lxc-create -t ubuntu -n webageTest1

Step 2: Install LXC webGUI and create clones of the Ubuntu containers.


wget http://lxc-webpanel.github.io/tools/install.sh -O - | bash

Step 3: Access it with http://localhost:5000 (default user/password is admin/admin): Create clones of the container to have multiple boxes for racks.



 Step 4:  Install a JVM inside the container. The default LXC Ubuntu container has a minimal set of packages. Update it with packages and select JVM to install.

apt-get install software-properties-common
add-apt-repository ppa:webupd8team/java
apt-get update
apt-get install oracle-java7-installer

Step 5: Set up Network: By default, LXC uses DHCP to provide dynamic IP addresses to containers. As Hadoop is very sensitive to IP addresses (it occasionally uses them instead of host names), so specify static IPs for Hadoop containers.
Open /etc/default/lxc-net uncommented the following line:
LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.conf

Step 6: Create  /etc/lxc/dnsmasq.conf with the following content:
dhcp-host=hadoop11,10.0.1.111
dhcp-host=hadoop12,10.0.1.112
dhcp-host=hadoop13,10.0.1.113
dhcp-host=hadoop14,10.0.1.114
dhcp-host=hadoop15,10.0.1.115
dhcp-host=hadoop16,10.0.1.116
dhcp-host=hadoop17,10.0.1.117
dhcp-host=hadoop18,10.0.1.118
dhcp-host=hadoop19,10.0.1.119
dhcp-host=hadoop21,10.0.1.121
dhcp-host=hadoop22,10.0.1.122
dhcp-host=hadoop23,10.0.1.123
dhcp-host=hadoop24,10.0.1.124
dhcp-host=hadoop25,10.0.1.125
dhcp-host=hadoop26,10.0.1.126
dhcp-host=hadoop27,10.0.1.127
dhcp-host=hadoop28,10.0.1.128
dhcp-host=hadoop29,10.0.1.129

Note:Naming convention is that the host name is hadoop[rack number][node number] and the IP is 10.0.1.1[rack number][hostname]

Step 7: Update and add all containers to /etc/hosts  for this exercise copy them to hadoop11, which will be used as a template to clone the rest of the nodes:
10.0.1.111      hadoop11
10.0.1.112      hadoop12
10.0.1.113      hadoop13
10.0.1.114      hadoop14
10.0.1.115      hadoop15
10.0.1.116      hadoop16
10.0.1.117      hadoop17
10.0.1.118      hadoop18
10.0.1.119      hadoop19
10.0.1.121      hadoop21
10.0.1.122      hadoop22
10.0.1.123      hadoop23
10.0.1.124      hadoop24
10.0.1.125      hadoop25
10.0.1.126      hadoop26
10.0.1.127      hadoop27
10.0.1.128      hadoop28
10.0.1.129      hadoop29

Step 8: Setting up Hadoop.
Follow Steps specified in Installing and configuring Hadoop cluster LAB of our courseware to get complete courseware mail info@xcelframeworks.com and clone it. (HINT:use setting up Hadoop cluster Lab to complete the set up).

Step 9: Setup /usr/local/hadoop-1.2.1/conf/topology.script.sh
#!/bin/bash          
HADOOP_CONF=/usr/local/hadoop-1.2.1/conf
echo `date` input: $@ >> $HADOOP_CONF/topology.log
while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< ${HADOOP_CONF}/topology.data
  result=""
  while read line ; do
    ar=( $line )
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done
  shift
  if [ -z "$result" ] ; then
#    echo -n "/default/rack "
     echo -n "/rack01"
  else
    echo -n "$result "
  fi
done

Step 10: Create the toplology.data file and place it in conf folder.

hadoop11        /rack01
hadoop12        /rack01
hadoop13        /rack01
hadoop14        /rack01
haddop15        /rack01
hadoop21        /rack02
hadoop22        /rack02
hadoop23        /rack02
hadoop24        /rack02
hadoop25        /rack02
hadoop31        /rack03
hadoop32        /rack03
hadoop33        /rack03
hadoop34        /rack03
hadoop35        /rack03
10.0.1.111      /rack01
10.0.1.112      /rack01
10.0.1.113      /rack01
10.0.1.114      /rack01
10.0.1.115      /rack01
10.0.1.121      /rack02
10.0.1.122      /rack02
10.0.1.123      /rack02
10.0.1.124      /rack02
10.0.1.125      /rack02
10.0.1.131      /rack03
10.0.1.132      /rack03
10.0.1.133      /rack03
10.0.1.134      /rack03
10.0.1.135      /rack03

Step 11:  Start the NameNode and see what gets logged:

# bin/hadoop-daemon.sh --config conf/ start namenode
starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-hadoop11.out
# cat conf/topology.log
Mon March  4 19:04:03 UTC 2014 input: 10.0.1.123 10.0.1.122 10.0.1.113 10.0.1.112 10.0.1.133 10.0.1.132

Step 12: As the NameNode started, it asked in a single called what is the rack name of all our nodes. This is what the script returns to the NameNode:

# conf/topology.script.sh 10.0.1.123 10.0.1.122 10.0.1.113 10.0.1.112 10.0.1.133 10.0.1.132
/rack02 /rack02 /rack01 /rack01 /rack03 /rack03

Step 13: Check the block placement by copying content from local file system to HDFS.

# /bin/hadoop fs -copyFromLocal /sw/big /big1
# /bin/hadoop fs -copyFromLocal /sw/big /big2

Step 14: Check the console for log.

Step 15: Use fsck commands to see block placements.

bin/hadoop fsck /big1 -files -blocks -racks

Step 16: Check the out put and look for the output specified below.

# bin/hadoop fsck /big1 -files -blocks -racks
FSCK started by root from /10.0.1.111 for path /big1 at Wed March 05 11:20:01 UTC 2014
/big1 130633102 bytes, 2 block(s):  OK
0. blk_3712902403633386081_1008 len=67108864 repl=3 [/rack03/10.0.1.133:50010, /rack03/10.0.1.132:50010, /rack02/10.0.1.123:50010]
1. blk_381038406874109076_1008 len=63524238 repl=3 [/rack03/10.0.1.132:50010, /rack03/10.0.1.133:50010, /rack01/10.0.1.113:50010]

Status: HEALTHY
 Total size:    130633102 B
 Total dirs:    0
 Total files:    1
 Total blocks (validated):    2 (avg. block size 65316551 B)
 Minimally replicated blocks:    2 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        6
 Number of racks:        3
FSCK ended at Mon Jan 06 11:20:01 UTC 2014 in 3 milliseconds

The filesystem under path '/big1' is HEALTHY



If any doubt mail or contact us...........info@xcelframeworks.com


No comments:

Post a Comment