How to configure Hadoop 1.0 on Windows 7

 hadoop_windows

Trouble no. 009

This post will explain you “How to configure Hadoop 1.0 on windows 7?”.

Just follow the step by step procedure given below,

also check proper definition of particular information

What is hadoop ?

Hadoop is an open source, the Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

We need Some software to complete our task,

Cygwin:

We need to download Unix command-line tool Cygwin. Cygwin is a large collection of GNU and Open Source tools which provide functionality similar to a Linux distribution on Windows. It is needed to run the scripts supplied with Hadoop because they are all written for the Unix platform.
Download Cygwin installer from below link. Select either 32 bit or 64 bit as per your operating system.

Java:
We need to download Java 1.6 or latest version. Download JDK 1.6 from the link Java SE 6 Downloads

Now,We need to install Cygwin, follow steps given below,

Install Cygwin

Step 1: Run Cygwin setup file.

Step 2: On the installation screen, Choose Install from the internet. go to Next.

h1.png

Step 3: Change installation path to “C:\cygwin” as Root Directory.

h2.png

Step 4: Select local package directory as “C:\cygwin” too.

h3.png

Step 5: On the select connection type window, choose any option you want.If you are new in procedure, I recommend selecting Direct Connection.

h4.png

Step 6: On the next window, there are many download sites are available. Choose any one of them, and click Next.

h5.jpg

Step 7:  Click on Next until you will ask for select packages.

Step 8: On the “Select Packages” window. type “open” in text box. Expand “Net” and click on “Skip” for selecting OpenSSH and OpenSSL.

h6

Step 9: Type dos2unix in a text box, Expand “utils” and click on “Skip” for selecting dos2unix.

h7

Step 10: After selecting all required packages click Next, Wait for a while complete procedure.

h8.jpg

Step 11: Click on finish to complete the installation process.

Cygwin is installed Now.


OpenSSH Configuration:

  • We will need to configure ssh components for executing Hadoop scripts.
  • Right click on Cygwin icon and select ” Run as administrator” option.

Now we will do some stuff on Cygwin Command line.

Step 1: Type command to start configuration of OpenSSH.

$ ssh-host-config

we will see Query, a follow procedure as seen below

*** Query: Should privilege separation be used(yes/no): yes“.

*** Query: new local account ‘sshd’? (yes/no): yes“.

*** Query: Do you want to install sshd as a service? (yes/no): yes“.

say no if it is already installed.

*** Query: Enter the value of CYGWIN for the daemon: [] ntsec“.

You’ll see the script give you some information on your system and then it will ask you to create a privileged account with the default username “cyg_server”. The default works well, so type “no” when it asks you if you want to use a different account name, otherwise you can change this if you want to.

h9.png

Type any Passphrase you want!

Now, you have done all things. You’ll see a message,

Configuration finished. Have fun!

To start sshd service, type following command.

$ net start sshd


User Configuration of SSH

Now, You need create suitable SSH key for your Account.

Type following command.

ssh-user-config

Follow the steps given below.

*** Query: Shall I create a SSH2 RSA identity file for you? (yes/no) yes“.

Enter Your Passphrase.

*** Query: Do you want to use this identity to login to this machine? (yes/no) – yes”.

*** Query: Shall I create a SSH2 DSA identity file for you? (yes/no) no“.

*** Query: Shall I create a SSH2 ECDSA identity file for you? (yes/no) no“.

*** Query: Shall I create a (deprecated) SSH1 RSA identity file for you? (yes/no) no“.

h14.png

Finally, It’s all done! To check your configuration type below command in your cygwin window.

$ ssh -v localhost


Install JAVA

  •  Requirement : Minimum 1.6 JAVA 64 bit

Install Java in C:\ folder, i.e. C:\JAVA

if you have already installed JAVA in prgram files, uninstall it and install JAVA in “C:\” folder.

Setting up Cygwin and JAVA Environment.

  1. Right click on “My Computer and select Properties item from the menu.
  2. Click on “Advanced System Settings”.
    In “System Properties” window, click on “Environment Variables” button and locate the PATH variable in “System Variables” section.
  3. Paste the bin folder path of Installed Cygwin C:\cygwin\bin; and click OK.path1path2
  4. Add new System Variable JAVA_HOME and add the installed JAVA path C:\JAVA\

Download Hadoop 1.2.1 Package

Now, lets download hadoop package

start.png

  • Execute command “$ explorer .” to locate your home directory. It will open up Cygwin Home Directory Folder.
  • Copy you just downloaded Hadoop folder and place it in the home directory folder (which just opened).h15.png

 


 The Hadoop Package

    1. Open Cygwin terminal (Run as Administration)
    2. Execute the tar command as below to start unpacking the Hadoop package.

 

$ tar -xzf hadoop-1.2.1.tar.gz
hadoop-1.2.1 is newest Package .
This process may take some time. After that Cygwin Command will appear.
h16.png
Execute command ” $ ls -l ” , by this command you will see list of content in Home Directory, there will new folder named hadoop-1.2.1.
To go in that folder type ” $ cd hadoop-1.2.1 ”
h17.png
if you see files in hadoop-1.2.1 direcory , means you unzipped hadoop-1.2.1 successfully.

Hadoop Configuration

  1. Open Cygwin and type the command “$ explorer .” to open Home folder.
  2. Create a folder with name “hadoop-dir”. And inside “hadoop-dir” folder create 2 folder give them names “datadir” and “namedir”.
  3. In Cygwin Command promt, execute below command,

 

$ chmod 755 hadoop-dir
$ cd hadoop-dir
$ chmod 755 datadir
$ chmod 755 namedir

h18.png

4. Execute following command on Cydwin Command prompt.

 

$ cd hadoop-x.y.z
$ cd conf
$ explorer .

 

By executing ” $ explorer . “, Windows explorer window will open.

Search and open file named ” hadoop-env.sh

  • Uncomment the line which contains “export JAVA_HOME” and provide your Java path.

 

export JAVA_HOME= “C:\Java\”
h20.png
  • Next, open ” core-site.xml ” and copy below code,

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50000</value>
</property>
</configuration>

h21.png

  • Open mapred -site.xml file and add below code

 

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:50001</value>
</property>
</configuration
h22.png
  • Open hdfs -site.xml file and add below code
  • Change the USER_FOLDER_NAME in the below code as per your your user name.

 

<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/USER_FOLDER_NAME/hadoop-dir/datadir</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/USER_FOLDER_NAME/hadoop-dir/namedir</value>
</property>
</configuration>

 

h23.png

  • Run dos2unix command for every file we changed.

 

$ dos2unix.exe hadoop-env.sh

$ dos2unix.execore-site.xml

$ dos2unix.exe hdfs-site.xml

$ dos2unix.exe mapred-site.xml

 

h24.png

Bingo!! You Successfully installed Hadoop in your windows..!



Format the NameNode and Run Hadoop Daemons :

  • Open Cygwin terminal (Run as Administration) and execute following command.

 

$ cd hadoop-1.2.1
$ bin/hadoop namenode -format 

 

This will take some time…….!!

 

Start Hadoop Daemons:

Once the filesystem has been created . Next step would be to check and start Hadoop Cluster Daemons NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker.

Restart the Cygwin Terminal and execute below command to start all daemons on Hadoop Cluster.

$ bin/start-all.sh

This command will start all the services in Cluster and now you have your Hadoop Cluster running.

Stop Hadoop Daemons:
To stop all the daemons, we can execute the command
$ bin/stop-all.sh

Web interface for the NameNode and the JobTracker:

After you have started the Hadoop Daemons by using the command bin/start-all.sh, you can open and check NameNode and JobTracker in browser.
By default they are available at below address.

The web interface for these services provide information and status of each of these components. They are first entry point to obtain a view of the state of a Hadoop cluster.

NameNode Web Interface:
The NameNode web interface can be accessed via the URL http://<host&gt;:50070/
In this case, http://localhost:50070/

h25.png
JobTracker Web Interface:
The JobTracker web interface can be accessed via the URL http://<namenode_host&gt;:50030/
In this case, http://localhost:50030/
h26.png


Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s