Hướng dẫn cloudera quickstart virtualbox windows

Hello everyone, in this article we are going to learn how to set up Cloudera Quickstart VM on a Windows machine. When we start learning Hadoop, we need its installation on a server or standalone system to practice. Though setting up Hadoop on a single machine is not very difficult, there is an easier way to set up a Hadoop environment quickly. We can use Cloudera’s Quickstart VMs. These VMs have Hadoop preconfigured and they are free and quick to set up.

Prerequisite

For Cloudera Quickstart VM to run on a single system,  it should have at least 8 GB of RAM. We can run this on a system with 4GB ram, but performance will be abysmal.

After this, we will need to download hypervisor.  The hypervisor is a piece of software on which wraps our VM and runs on a host machine.  We can set up Cloudera Quickstart VM using three hypervisors.

  1. VMWare (need to pay for a license after free trial)
  2. Virtualbox (opensource)
  3. KVM (opensource)

You can choose any of these hypervisors. In this article, we are going to use Virtualbox. You can download Virtualbox from here. Choose windows host and download Virtualbox. Once the download is complete, install Virtualbox on your machine.

After that, we need to download VM from Cloudera. You can get that from this link. On this page, in platform choose Virtualbox and click on GET IT NOW. Then it will ask you to fill up a small form. Once you fill up these details and accept terms and conditions, VM download will begin automatically. After the download is complete, we need to extract VM. We can use 7zip for this.

Setting up VM

Now start Virtualbox and click on the Settings icon and then System and choose motherboard tab. There you can set how much RAM Virtualbox can use. Set this value to 4GB or more.

After that, click on File-> Import Appliance. From a new window, browse to location to where you have extracted Quickstart VM in an earlier step. Choose *.ovf file. Then click on Next then on import.

It starts importing Cloudera VM on Virtualbox. You can run this VM by clicking on the Start icon in Virtualbox.

Validating VM Set up

Once VM starts running, we can validate it by logging to hue.  The default username and password for this all Cloudera VMs is “cloudera”, “cloudera” respectively.  You can check different services like Hive, Pig from hue.

Conclusion

In this article, we have set up Cloudera Quickstart VM on windows. We will use the same for our future tutorials and Hadoop practice. If you have faced any issues with this setup, then please ask me in the comment section below. See you in the next article.

Cloudera is a software that provides a platform for data analytics, data warehousing, and machine learning. Initially, Cloudera started as an open-source Apache Hadoop distribution project, commonly known as Cloudera Distribution for Hadoop or CDH. It contains Apache Hadoop and other related projects where all the components are 100% open-source under Apache License. 

Cloudera provides virtual machine images of complete Apache Hadoop clusters, making it easy to get started with Cloudera CDH.

What Is Cloudera QuickStart VM?

Cloudera QuickStart VM includes everything that you would need for using CDH, Impala, Cloudera Search, and Cloudera Manager. The Cloudera QuickStart VM uses a package-based install that allows you to work with or without the Cloudera Manager. It has a sample of Cloudera’s platform for “Big Data.”

Now that you have a brief understanding of what Cloudera QuickStart VM is, let’s have a look at the prerequisites to install Cloudera QuickStart VM.

Cloudera QuickStart VM Installation - Prerequisites 

  1. A virtual machine such as Oracle Virtual Box or VMWare
  2. RAM of 12+ GB. That is 4+ GB for the operating system and 8+ GB for Cloudera
  3. 80GB hard disk

Downloading the Cloudera QuickStart VM

  • The Cloudera QuickStart VMs are openly available as Zip archives in VirtualBox, VMware and KVM formats. To download the VM, search for https://www.cloudera.com/downloads.html, and select the appropriate version of CDH that you require.

Fig: Download Cloudera QuickStart VM

  • Click on the ‘GET IT NOW’ button, and it will prompt you to fill in your details.
  • Once the file is downloaded, go to the download folder and unzip these files. It can then be used to set up a single node Cloudera cluster. 
  • Shown below are the two virtual images of Cloudera QuickStart VM.

  • Now that the downloading process is done with, let's move forward with this Cloudera QuickStart VM Installation guide and see the actual process. 

Cloudera QuickStart VM Installation

  • Before setting up the Cloudera Virtual Machine, you would need to have a virtual machine such as VMware or Oracle VirtualBox on your system.
  • In this case, we are using Oracle VirtualBox to set up the Cloudera QuickStart VM.
  • In order to download and install the Oracle VirtualBox on your operating system, click on the following link: Oracle VirtualBox.
  • To set up the Cloudera QuickStart VM in your Oracle VirtualBox Manager, click on ‘File’ and then select ‘Import Appliance’.

Fig: Importing the Cloudera QuickStart VM image

  • Choose the QuickStart VM image by looking into your downloads. Click on ‘Open’ and then ‘Next’. Now you can see the specifications, then click on ‘Import’. This will start importing the virtual disk image .vmdk file into your VM box. 

  • Once this is done, we have to change the specifications of the machines to use. Here, we are giving 2 CPU cores and 5GB RAM. Wait for a while, as the importing finishes. The next step is to go ahead and set up a Cloudera QuickStart VM for practice. Once the importing is complete, you can see the Cloudera QuickStart VM on the left side panel.

Hướng dẫn cloudera quickstart virtualbox windows

Fig: Cloudera VM set up successful

  • Now, to give more RAM and CPU cores, click on ‘Settings’, followed by ‘System’, and increase the RAM to 5GB. Click on the processor and assign 2 CPU cores. Subsequently, select ‘Network’. The Adapter 1 settings should be NAT by default. Click on ‘OK’ next.
  • Now you are required to start the machine, so that it uses 2 CPU cores, 5GB RAM, and brings up the Cloudera QuickStart VM.
  • The next step will be going ahead and starting the machine by clicking the ‘Start’ symbol on top. 
  • Once your machine comes on, it will look like this:

  • Next, we have to follow a few steps to gain admin console access. You need to click on the terminal present on top of the desktop screen, and type in the following:

hostname # This shows the hostname which will be quickstart.cloudera

hdfs dfs -ls / # Checks if you have access and if your cluster is working. It displays what exists on     your HDFS location by default

service cloudera-scm-server status # Tells what command you have to type to use cloudera express free 

su - #Login as root

service cloudera-scm-server status # The password for root is cloudera

  • Once you see that your HDFS access is working fine, you can close the terminal. Then, you have to click on the following icon that says ‘Launch Cloudera Express’. 

  • Once you click on the express icon, a screen will appear with the following command:

  • You are required to copy the command, and run it on a separate terminal. Hence, open a new terminal, and use the below command to close the Cloudera based services. It will restart the services, after which you can access your admin console.

Fig: Restarting services on Cloudera QuickStart VM

  • Now that our deployment has been configured, client configurations have also been deployed. Additionally, it has restarted the Cloudera Management Service, which gives access to the Cloudera QuickStart admin console with the help of a username and password.
  • Go on and open up the browser and change the port number to 7180. 
  • You can log in to the Cloudera Manager by providing your username and password.

Fig: Logging in to Cloudera Manager

  • Since Cloudera is CPU and memory intensive, it could slow down if you haven’t assigned enough RAM to the Cloudera cluster. So, it’s always recommended to stop or delete the services that you don’t need.
  • Next, click on the drill-down button associated with each service and select delete to remove it.

Fig: Deleting unnecessary services on Cloudera QuickStart VM

  • Before deleting any service, you must remove all the dependencies for that particular service. You can add services to your cluster at any point in time when you need it. You can also fix different configuration issues thereupon.
  • In Cloudera Manager, you can fix the health issues or configuration issues within your cluster.

Fig: Solving Health and Configuration Issues on Cloudera QuickStart VM

  • You can go ahead and restart the services now. It will ensure that the cluster becomes accessible either by Hue as a web interface or Cloudera QuickStart Terminal, where you can write your commands.
  • You can switch to an HDFS user, which is the admin user. This usually does not have a password unless you have set it. Now, you can type any HDFS command in the terminal, which will give the output.

Looking forward to becoming a Hadoop Developer? Check out the Big Data Hadoop Certification Training course and get certified today.

Conclusion

Cloudera QuickStart VM allows you to implement and administer Hadoop related tools and services effortlessly. In this article, we looked at what Cloudera QuickStart VM is, and what the prerequisites are to install Cloudera QuickStart VM. 

We also understood how to download the Cloudera QuickStart VM on windows. Finally, we demonstrated a step-by-step process to install and configure Cloudera QuickStart VM. 

To learn more about Cloudera QuickStart VM, click on the following video link: Cloudera QuickStart VM Installation

Want to know anything more about installing the Cloudera QuickStart VM? Comment on this article and our experts will get back to you at the earliest!