Apache NiFi is an open source project mainly designed to support automation of data flows between systems.

This blog is part of a complete guide divided in 3 separate posts:

The complete guide will basically show you how to install and configure an Apache NiFi instance with SSL, LDAP Authentication, policy permissions and also configuring a NiFi cluster using either the embedded zookeeper service or an already configured zookeeper quorum in your current environment.


Cluster configuration

If you followed my other posts, you would currently have a single NiFi instance configured with HTTPS/SSL, LDAP Authentication and have setup the necessary policies depending on your use case.

Now, we will configure a NiFi cluster using zookeeper for failover capabilities and uptime of the NiFi service.

There are two methods of setting up a cluster in Apache NiFi:

  1. Using the embedded zookeeper service provided with the installation itself.
  2. Using an already configured zookeeper quorum in your environment.

Using the Embedded Zookeeper

When making use of the Embedded zookeeper service, the only way of making full use of zookeeper capabilities, is to have a quorum of either 3 or 5 nodes. Reason being is that you can then restart the NiFi service without worrying that the entire cluster goes down.

IMP. You would need to set up another 2 machines with the same settings, HTTPS/SSL configuration and nifi version, to have a working 3 node cluster. Also since we used a self-signed wildcard certificate on the other node, you can transfer the same files to the other 2 machines (nifi.properties, keystore.jks, truststore.jks, nifi-cert.pem,nifi-key.key) and put them under the /nifi/conf directory. If not using a wildcard, you would need to trust the individual certificates on each host for the communication to be successful.

Make sure that all the servers you will be using to configure the cluster, can communicate with each other and that you add the necessary details in /etc/hosts so that the hostnames can be resolved to the correct IP.

Using your preferred text editor, edit the nifi.properties:

sudo nano /nifi/conf/nifi.properties

Find the # State Management # config and set nifi.state.management.embedded.zookeeper.start to “true“. This will enable the embedded zookeeper when NiFi is started/restarted on that host.

NiFiEmbeddedZookeeper

Find also the # cluster node properties (only configure for cluster nodes) # and set the following configs:

  • nifi.cluster.is.node=true
  • nifi.cluster.node.address=*put server hostname* (e.g. nifi01.mintopsblog.local)
  • nifi.zookeeper.connect.string=*put all the zookeeper server hostnames that you will configure*

NiFiEmbeddedZookeeper5

Edit the zookeeper.properties found in the conf directory to add all the servers which will have the embedded zookeeper service.

sudo nano /nifi/conf/zookeeper.properties

At the end of the file, you would need to add the server hostnames of all the servers which will have the embedded zookeeper service enabled.

NiFiEmbeddedZookeeper2

Once that is all done, we would need to create a another directory under the state directory depending on what the dataDir property is set to.

Take note of the dataDir property inside zookeeper.properties.

cat /nifi/conf/zookeeper.properties | grep -i dataDir=

NiFiEmbeddedZookeeper4

sudo mkdir /nifi/state/zookeeper

Depending on how you configured the servers inside zookeeper.properties, you would need to give an ID number to the specified hosts. If you had a host (e.g. nifi01) set in server.1, that host needs to have the ID 1. Do the following command on each host and give them their respective ID numbers.

echo 1 > /nifi/state/zookeeper/myid

Next, edit the state-management.xml file

sudo nano /nifi/conf/state-management.xml

Find <cluster-provider> property in the file and add all your zookeeper servers (e.g. nifi01.mintopsblog.local:2181) in the <property name=”Connect String”>

NiFiEmbeddedZookeeper3

Start all the nifi applications at the same time so that the Cluster Election can start.

sudo ./nifi/bin/nifi.sh start

In the nifi logs you will first notice that zookeeper is trying to find the cluster coordinator, if not found, it will assign a node to become the coordinator automatically. After that, an “election” will be held to see which node has the correct/latest flow so that it can become the primary node. Since this is a new cluster, all nodes should have a clean configuration and a primary node will be chosen after the timer runs out.

NiFiEmbeddedZookeeper6

When the election is done, the nifi cluster should be up and running. Login to any NiFi host URL and you should see the following on the top left:

NiFiCluster

You can also check the cluster status from the following:

NiFiCluster1

NiFiCluster2

Any changes done on the NiFi flows will be automatically replicated/synced on the other nodes.


Using an already configured Zookeeper quorum

When having a Zookeeper quorum on your cluster, you can use this to configure the NiFi cluster. This actually makes it easier since it will only require the following settings:

Edit the nifi.properties file:

sudo nano /nifi/conf/nifi.properties

Find the # cluster node properties (only configure for cluster nodes) # and set the following configs:

  • nifi.cluster.is.node=true
  • nifi.cluster.node.address=*put server hostname* (e.g. nifi01.mintopsblog.local)
  • nifi.zookeeper.connect.string=*put all the zookeeper server hostnames*

NiFiEmbeddedZookeeper5

Next, edit the state-management.xml file.

sudo nano /nifi/conf/state-management.xml

Find <cluster-provider> property in the file and add all your zookeeper servers (e.g. nifi01.mintopsblog.local:2181) in the <property name=”Connect String”>

NiFiEmbeddedZookeeper3

Start the nodes by running the following:

cd /nifi/conf
sudo ./nifi.sh start

And that’s all! The rest will be taken care of by the NiFi starting process. It will create the necessary directory in the zookeeper quorum and then automatically choose which node will become the cluster coordinator and which will become the primary node.


Hope this guide helps you out, if you have any difficulties don’t hesitate to post a comment. Also, any needed improvements or mistakes done in the guides feel free to point them out.

Advertisements

5 comments

  1. for this statement “nifi.cluster.node.address=nifi01.mintopsblog.local” what valve should I provide if I have more than one pod running

    Like

  2. Hi Jack, for that statement, you would need to input the actual server name where the NiFi application is installed (e.g. if your server is named test01.localhost, you would need to input test01.localhost in that statement). This would then need to be done for each and every NiFi node (e.g. test02.localhost, test03.localhost,etc…). Also make sure that the servers can communicate with each other using the DNS names (A good way of doing this would be to add them to the /etc/hosts file). Hope that answers your question

    Like

  3. I faced following WARNING when using Remote Process Group from 2 nodes cluster:

    WARNING Unable to connect to https://host1.nifi:10443/nifi due to javax.net.ssl.SSLPeerUnverifiedException: Certificate for doesn’t match any of the subject alternative names: [*.nifi]

    My cluster consists of host1.nifi and host2.nifi. It seems no problem with cluster hearbeat checking as well as others Processor or Processor Group.

    Could you please hepl me to fix this issue.

    Like

  4. Hi Pham, one way of fixing this issue would be to add SAN (Subject Alternative Names) to your self-signed certificate, since the remote process group checks the keystore and will check for a CN (Certificate Name) that matches with the exact host (in your case, host1.nifi or host2.nifi). The Remote Process Group mainly does this to make sure that the connection is being done by the actual host instead of an unwanted connection by someone else.

    Like

  5. ok but I am trying to run the nifi-cluster in kubernetes and I cant provide values to each node after deployement

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s