Trouble no. 009
This post will explain you “How to configure Hadoop 1.0 on windows 7?”.
Just follow the step by step procedure given below,
also check proper definition of particular information
What is hadoop ?
Hadoop is an open source, the Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
We need Some software to complete our task,
We need to download Unix command-line tool Cygwin. Cygwin is a large collection of GNU and Open Source tools which provide functionality similar to a Linux distribution on Windows. It is needed to run the scripts supplied with Hadoop because they are all written for the Unix platform.
Download Cygwin installer from below link. Select either 32 bit or 64 bit as per your operating system.
We need to download Java 1.6 or latest version. Download JDK 1.6 from the link Java SE 6 Downloads
Now,We need to install Cygwin, follow steps given below,
Step 1: Run Cygwin setup file.
Step 2: On the installation screen, Choose Install from the internet. go to Next.
Step 3: Change installation path to “C:\cygwin” as Root Directory.
Step 4: Select local package directory as “C:\cygwin” too.
Step 5: On the select connection type window, choose any option you want.If you are new in procedure, I recommend selecting Direct Connection.
Step 6: On the next window, there are many download sites are available. Choose any one of them, and click Next.
Step 7: Click on Next until you will ask for select packages.
Step 8: On the “Select Packages” window. type “open” in text box. Expand “Net” and click on “Skip” for selecting OpenSSH and OpenSSL.
Step 9: Type dos2unix in a text box, Expand “utils” and click on “Skip” for selecting dos2unix.
Step 10: After selecting all required packages click Next, Wait for a while complete procedure.
Step 11: Click on finish to complete the installation process.
Cygwin is installed Now.
- We will need to configure ssh components for executing Hadoop scripts.
- Right click on Cygwin icon and select ” Run as administrator” option.
Now we will do some stuff on Cygwin Command line.
Step 1: Type command to start configuration of OpenSSH.
we will see Query, a follow procedure as seen below
“*** Query: Should privilege separation be used(yes/no): yes“.
“*** Query: new local account ‘sshd’? (yes/no): yes“.
“*** Query: Do you want to install sshd as a service? (yes/no): yes“.
say no if it is already installed.
“*** Query: Enter the value of CYGWIN for the daemon:  ntsec“.
You’ll see the script give you some information on your system and then it will ask you to create a privileged account with the default username “cyg_server”. The default works well, so type “no” when it asks you if you want to use a different account name, otherwise you can change this if you want to.
Type any Passphrase you want!
Now, you have done all things. You’ll see a message,
Configuration finished. Have fun!
To start sshd service, type following command.
$ net start sshd
User Configuration of SSH
Now, You need create suitable SSH key for your Account.
Type following command.
Follow the steps given below.
“*** Query: Shall I create a SSH2 RSA identity file for you? (yes/no) yes“.
Enter Your Passphrase.
“*** Query: Do you want to use this identity to login to this machine? (yes/no) – yes”.
“*** Query: Shall I create a SSH2 DSA identity file for you? (yes/no) no“.
“*** Query: Shall I create a SSH2 ECDSA identity file for you? (yes/no) no“.
“*** Query: Shall I create a (deprecated) SSH1 RSA identity file for you? (yes/no) no“.
Finally, It’s all done! To check your configuration type below command in your cygwin window.
$ ssh -v localhost
- Requirement : Minimum 1.6 JAVA 64 bit
Install Java in C:\ folder, i.e. C:\JAVA
if you have already installed JAVA in prgram files, uninstall it and install JAVA in “C:\” folder.
Setting up Cygwin and JAVA Environment.
- Right click on “My Computer and select Properties item from the menu.
- Click on “Advanced System Settings”.
In “System Properties” window, click on “Environment Variables” button and locate the PATH variable in “System Variables” section.
- Paste the bin folder path of Installed Cygwin C:\cygwin\bin; and click OK.
- Add new System Variable JAVA_HOME and add the installed JAVA path C:\JAVA\
Download Hadoop 1.2.1 Package
Now, lets download hadoop package
- Download hadoop 1.2.1
- Open Cygwin terminal (Run as Administrator).
- Execute command “$ explorer .” to locate your home directory. It will open up Cygwin Home Directory Folder.
- Copy you just downloaded Hadoop folder and place it in the home directory folder (which just opened).
The Hadoop Package
- Open Cygwin terminal (Run as Administration)
- Execute the tar command as below to start unpacking the Hadoop package.
- Open Cygwin and type the command “$ explorer .” to open Home folder.
- Create a folder with name “hadoop-dir”. And inside “hadoop-dir” folder create 2 folder give them names “datadir” and “namedir”.
- In Cygwin Command promt, execute below command,
$ chmod 755 hadoop-dir
$ cd hadoop-dir
$ chmod 755 datadir
$ chmod 755 namedir
4. Execute following command on Cydwin Command prompt.
$ cd hadoop-x.y.z
$ cd conf
$ explorer .
By executing ” $ explorer . “, Windows explorer window will open.
Search and open file named ” hadoop-env.sh ”
- Uncomment the line which contains “export JAVA_HOME” and provide your Java path.
- Next, open ” core-site.xml ” and copy below code,
- Open mapred -site.xml file and add below code
- Open hdfs -site.xml file and add below code
- Change the USER_FOLDER_NAME in the below code as per your your user name.
- Run dos2unix command for every file we changed.
$ dos2unix.exe hadoop-env.sh
$ dos2unix.exe hdfs-site.xml
$ dos2unix.exe mapred-site.xml
Bingo!! You Successfully installed Hadoop in your windows..!
Format the NameNode and Run Hadoop Daemons :
- Open Cygwin terminal (Run as Administration) and execute following command.
$ cd hadoop-1.2.1
$ bin/hadoop namenode -format
This will take some time…….!!
Start Hadoop Daemons:
Once the filesystem has been created . Next step would be to check and start Hadoop Cluster Daemons NameNode, DataNode, SecondaryNameNode, JobTracker, TaskTracker.
Restart the Cygwin Terminal and execute below command to start all daemons on Hadoop Cluster.
This command will start all the services in Cluster and now you have your Hadoop Cluster running.
Web interface for the NameNode and the JobTracker:
After you have started the Hadoop Daemons by using the command bin/start-all.sh, you can open and check NameNode and JobTracker in browser.
By default they are available at below address.
The web interface for these services provide information and status of each of these components. They are first entry point to obtain a view of the state of a Hadoop cluster.
The JobTracker web interface can be accessed via the URL http://<namenode_host>:50030/
In this case, http://localhost:50030/