XcelFrameworks BigData Blog: Installing multiRack hadoop cluster

In previous Blog we understood how to use LXC to simulate multiple nodes .This Post helps you to configure multitrack cluster.Lets start directly on it .

Step 1: Create your first Ubuntu container called webageTest1 – it will take a few minutes as some O/S packages will be downloaded. But they will be cached, so the next containers will be created in a few seconds.

lxc-create -t ubuntu -n webageTest1

Step 2: Install LXC webGUI and create clones of the Ubuntu containers.

wget http://lxc-webpanel.github.io/tools/install.sh -O - | bash

Step 3: Access it with http://localhost:5000 (default user/password is admin/admin): Create clones of the container to have multiple boxes for racks.

Step 4: Install a JVM inside the container. The default LXC Ubuntu container has a minimal set of packages. Update it with packages and select JVM to install.

apt-get install software-properties-common

add-apt-repository ppa:webupd8team/java

apt-get update

apt-get install oracle-java7-installer

Step 5: Set up Network: By default, LXC uses DHCP to provide dynamic IP addresses to containers. As Hadoop is very sensitive to IP addresses (it occasionally uses them instead of host names), so specify static IPs for Hadoop containers.

Open /etc/default/lxc-net uncommented the following line:

LXC_DHCP_CONFILE=/etc/lxc/dnsmasq.conf

Step 6: Create /etc/lxc/dnsmasq.conf with the following content:

dhcp-host=hadoop11,10.0.1.111

dhcp-host=hadoop12,10.0.1.112

dhcp-host=hadoop13,10.0.1.113

dhcp-host=hadoop14,10.0.1.114

dhcp-host=hadoop15,10.0.1.115

dhcp-host=hadoop16,10.0.1.116

dhcp-host=hadoop17,10.0.1.117

dhcp-host=hadoop18,10.0.1.118

dhcp-host=hadoop19,10.0.1.119

dhcp-host=hadoop21,10.0.1.121

dhcp-host=hadoop22,10.0.1.122

dhcp-host=hadoop23,10.0.1.123

dhcp-host=hadoop24,10.0.1.124

dhcp-host=hadoop25,10.0.1.125

dhcp-host=hadoop26,10.0.1.126

dhcp-host=hadoop27,10.0.1.127

dhcp-host=hadoop28,10.0.1.128

dhcp-host=hadoop29,10.0.1.129

Note:Naming convention is that the host name is hadoop[rack number][node number] and the IP is 10.0.1.1[rack number][hostname]

Step 7: Update and add all containers to /etc/hosts for this exercise copy them to hadoop11, which will be used as a template to clone the rest of the nodes:

10.0.1.111 hadoop11

10.0.1.112 hadoop12

10.0.1.113 hadoop13

10.0.1.114 hadoop14

10.0.1.115 hadoop15

10.0.1.116 hadoop16

10.0.1.117 hadoop17

10.0.1.118 hadoop18

10.0.1.119 hadoop19

10.0.1.121 hadoop21

10.0.1.122 hadoop22

10.0.1.123 hadoop23

10.0.1.124 hadoop24

10.0.1.125 hadoop25

10.0.1.126 hadoop26

10.0.1.127 hadoop27

10.0.1.128 hadoop28

10.0.1.129 hadoop29

Step 8: Setting up Hadoop.

Follow Steps specified in Installing and configuring Hadoop cluster LAB of our courseware to get complete courseware mail info@xcelframeworks.com and clone it. (HINT:use setting up Hadoop cluster Lab to complete the set up).

Step 9: Setup /usr/local/hadoop-1.2.1/conf/topology.script.sh

#!/bin/bash

HADOOP_CONF=/usr/local/hadoop-1.2.1/conf

echo `date` input: $@ >> $HADOOP_CONF/topology.log

while [ $# -gt 0 ] ; do

nodeArg=$1

exec< ${HADOOP_CONF}/topology.data

result=""

while read line ; do

ar=( $line )

if [ "${ar[0]}" = "$nodeArg" ] ; then

result="${ar[1]}"

done

shift

if [ -z "$result" ] ; then

# echo -n "/default/rack "

echo -n "/rack01"

else

echo -n "$result "

done

Step 10: Create the toplology.data file and place it in conf folder.

hadoop11 /rack01

hadoop12 /rack01

hadoop13 /rack01

hadoop14 /rack01

haddop15 /rack01

hadoop21 /rack02

hadoop22 /rack02

hadoop23 /rack02

hadoop24 /rack02

hadoop25 /rack02

hadoop31 /rack03

hadoop32 /rack03

hadoop33 /rack03

hadoop34 /rack03

hadoop35 /rack03

10.0.1.111 /rack01

10.0.1.112 /rack01

10.0.1.113 /rack01

10.0.1.114 /rack01

10.0.1.115 /rack01

10.0.1.121 /rack02

10.0.1.122 /rack02

10.0.1.123 /rack02

10.0.1.124 /rack02

10.0.1.125 /rack02

10.0.1.131 /rack03

10.0.1.132 /rack03

10.0.1.133 /rack03

10.0.1.134 /rack03

10.0.1.135 /rack03

Step 11: Start the NameNode and see what gets logged:

# bin/hadoop-daemon.sh --config conf/ start namenode

starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-hadoop11.out

# cat conf/topology.log

Mon March 4 19:04:03 UTC 2014 input: 10.0.1.123 10.0.1.122 10.0.1.113 10.0.1.112 10.0.1.133 10.0.1.132

Step 12: As the NameNode started, it asked in a single called what is the rack name of all our nodes. This is what the script returns to the NameNode:

# conf/topology.script.sh 10.0.1.123 10.0.1.122 10.0.1.113 10.0.1.112 10.0.1.133 10.0.1.132

/rack02 /rack02 /rack01 /rack01 /rack03 /rack03

Step 13: Check the block placement by copying content from local file system to HDFS.

# /bin/hadoop fs -copyFromLocal /sw/big /big1

# /bin/hadoop fs -copyFromLocal /sw/big /big2

Step 14: Check the console for log.

Step 15: Use fsck commands to see block placements.

bin/hadoop fsck /big1 -files -blocks -racks

Step 16: Check the out put and look for the output specified below.

# bin/hadoop fsck /big1 -files -blocks -racks

FSCK started by root from /10.0.1.111 for path /big1 at Wed March 05 11:20:01 UTC 2014

/big1 130633102 bytes, 2 block(s): OK

0. blk_3712902403633386081_1008 len=67108864 repl=3 [/rack03/10.0.1.133:50010, /rack03/10.0.1.132:50010, /rack02/10.0.1.123:50010]

1. blk_381038406874109076_1008 len=63524238 repl=3 [/rack03/10.0.1.132:50010, /rack03/10.0.1.133:50010, /rack01/10.0.1.113:50010]

Status: HEALTHY

Total size: 130633102 B

Total dirs: 0

Total files: 1

Total blocks (validated): 2 (avg. block size 65316551 B)

Minimally replicated blocks: 2 (100.0 %)

Over-replicated blocks: 0 (0.0 %)

Under-replicated blocks: 0 (0.0 %)

Mis-replicated blocks: 0 (0.0 %)

Default replication factor: 3

Average block replication: 3.0

Corrupt blocks: 0

Missing replicas: 0 (0.0 %)

Number of data-nodes: 6

Number of racks: 3

FSCK ended at Mon Jan 06 11:20:01 UTC 2014 in 3 milliseconds

The filesystem under path '/big1' is HEALTHY

If any doubt mail or contact us...........info@xcelframeworks.com

XcelFrameworks BigData Blog

Search This Blog

Thursday, 25 December 2014

Installing multiRack hadoop cluster

No comments:

Post a Comment