Ambari/HDP – Install & Configure an HDP BigData environment

Ambari is an open source tool that provides central web management, configuration and installation of various BigData services.

HDP is an abbreviation for Hortonworks Data Platform. Hortonworks provides a complete distribution framework (HDP Repository) where all the main BigData tools/services can be found and installed via Ambari itself in an easier way. Hortonworks Site

This guide will show you how you can start with a small BigData environment (can also be a one node environment) which can be used for testing purposes. You can then re-use this guide to productionize a larger cluster once you have a good grasp of what each service provides.


Pre-Requisites

You will need some pre-requisites before starting:

  • Machine Specifications (if you’re installing everything on a single node use below)
    • You can go lower with the below specs, but you might see performance degradion and also services not starting, mainly because of lack of memory
      • CPU: 4
      • RAM: 8GB
    • For a production cluster, the machine specification really depend on the use case and requirements of each company.
  • Operating System (any of the below):
    • Redhat 6/7
    • CentOS 6/7
    • Ubuntu 14/16
    • Debian 7
    • OpenSUSE 11/12
  • JAVA
    • Version 1.8

UNIX service configuration

Will be using RedHat/CentOS 7 as an operating system for this guide. Let’s start…

  • SSH to your UNIX box and do the following:
    • First change the hostname to any desired name you would like:
      sudo nano /etc/hostname
    • Add the internal IP and hostname in the /etc/hosts file using your preferred text editor. (e.g. as below). If you have a DNS server, add the necessary A records as well for better internal network communication via hostnames.
      sudo nano /etc/hosts
      172.19.30.1 ambari.mintopsblog.local
    • Disable SELINUX and firewall
      setenforce 0
      sudo nano /etc/selinux/config
      SELINUX=disabled
      systemctl disable firewalld
      service firewalld stop
  • Install the following services:
    sudo yum -y install wget
    sudo yum -y install unzip
    sudo yum -y install ntp
    sudo yum -y install java-1.8.0-openjdk
    sudo yum -y remove chrony (removing as on CentOS/RedHat 7 this service might have an impact on the NTP service)
  • Update all the packages afterwards (optional)
    sudo yum -y update
  • Start and configure the NTP service
    sudo systemctl enable ntpd
    sudo systemctl start ntpd
    timedatectl set-timezone UTC
  • Reboot node to take the necessary configurations (mainly changing of hostname)
    sudo reboot

Ambari/HDP Installation

Now that we have the UNIX node configured with the necessary services, we can start with the Ambari installation.

First thing to do is to grab the repository. Hortonworks provide the necessary URL for these repositories depending on version and OS. Will be using the latest version which is Ambari 2.6 (from the writing of this blog): Hortonworks Ambari 2.6 Repository

sudo wget http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.6.0.0/ambari.repo
sudo mv ambari.repo /etc/yum.repos.d/
yum repolist (to check that the repository has been added)
  • Install the ambari-server service from the previously downloaded repository.
    yum -y install ambari-server
  • Configure the ambari-server service (you can go with all defaults for now. This will use the “root” user for every configuration and installation done by Ambari itself).
    ambari-server setup
  • Once it’s configured, you can now start the service.
    ambari-server start
  • Go to the respective URL http://*hostname*:8080 and login to Ambari using the default user and password
    • Username: admin
    • Password: admin
  • Once logged in, select “Launch Install Wizard
    • Name your cluster as desired (e.g. Production, Development, etc…)
    • Choose the HDP version you would like to install (every HDP version will have different service versions): HDP Information
    • The next window will ask you to provide the following:
      • A list of hostnames you would like to add to the cluster (N.B. one hostname per new line). Make sure that all the hosts can communicate/resolve each other using the hostname. Best way would be to add all the hostnames inside /etc/hosts on each node.
      • An RSA Private key, that you would need to setup for the Ambari service to have passwordless SSH access to the machines to install the necessary Ambari Agents. This can be done as follows:
        • SSH to the Ambari Server UNIX node.
        • Escalate to the “root” user
          sudo su root
        • Create the passwordless SSH key with the following command
          ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa
        • Once it is created, you would need to copy the contents of the ~/.ssh/id_rsa.pub of the Ambari server to the ~/.ssh/authorized_keys of all the nodes you will be adding to the cluster (including the Ambari server itself)
          cat ~/.ssh/id_rsa.pub
        • Copy the content and paste it inside ~/.ssh/authorized_keys. If the authorized_keys file does not exist, do the following:
          touch ~/.ssh/authorized_keys
          chmod 600 ~/.ssh/authorized_keys
        • You can test whether the passwordless SSH works by using the ssh command
          ssh root@*hostname*
      • Now that we have the RSA key, you would need to copy and paste the content of the id_rsa file in the Ambari web installation.
        cat ~/.ssh/id_rsa
      • N.B – The passwordless SSH key is only needed during the Ambari agent installation and registration. Once the agents are installed and the cluster is deployed, you can remove the SSH public key from the authorized_keys.
    • Select “Register and Confirm
    • Choose what services you would like to be installed. (Any service can also be installed after the cluster deployment)
    • Choose the nodes on which the services will be installed on.
    • Choose the Datanodes and the Nodemanagers hosts. (basically on which hosts the data will be stored (Datanodes) and which hosts the cluster will use in regards to computing resources in terms of CPU and RAM (Nodemanagers).
    • Customize the needed services. (Examples…)
      • What hard disks will be used to store data. (Datanode uses specific directories where to store data. If you have a directory mount, the service calculates the amount of disk space that directory mount has, just make sure that the same directory can be found on each and every host that has the Datanode service installed. This will then add the total amount of the calculated disk to the HDFS service)
      • The ones with the red number are services which require the user to enter a password
    • Review the deploy details that you have configured and once satisfied, you can start the installation. It might take some time depending on how many services have been installed and how much specs the machine has.
    • Once done, Ambari will try to start all the services. If not all services are started you can then try to manually start them yourself.
    • You now have a simple HDP environment where you can start testing  and grasp better the configurations of each service.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: