Apache NiFi is an open source project mainly designed to support automation of data flows between systems.
This blog is part of a complete guide divided in 3 separate posts:
- Part 1: Apache NiFi – Basic installation with HTTPS/SSL & LDAP Configuration
- Part 2: Apache NiFi – Configure Data Flow & NiFi Policies
- Part 3: Apache NiFi – Cluster configuration
The complete guide will basically show you how to install and configure an Apache NiFi instance with SSL, LDAP Authentication, policy permissions and also configuring a NiFi cluster using either the embedded zookeeper service or an already configured zookeeper quorum in your current environment.
Cluster configuration
If you followed my other posts, you would currently have a single NiFi instance configured with HTTPS/SSL, LDAP Authentication and have setup the necessary policies depending on your use case.
Now, we will configure a NiFi cluster using zookeeper for failover capabilities and uptime of the NiFi service.
There are two methods of setting up a cluster in Apache NiFi:
- Using the embedded zookeeper service provided with the installation itself.
- Using an already configured zookeeper quorum in your environment.
Using the Embedded Zookeeper
When making use of the Embedded zookeeper service, the only way of making full use of zookeeper capabilities, is to have a quorum of either 3 or 5 nodes. Reason being is that you can then restart the NiFi service without worrying that the entire cluster goes down.
IMP. You would need to set up another 2 machines with the same settings, HTTPS/SSL configuration and nifi version, to have a working 3 node cluster. Also since we used a self-signed wildcard certificate on the other node, you can transfer the same files to the other 2 machines (nifi.properties, keystore.jks, truststore.jks, nifi-cert.pem,nifi-key.key) and put them under the /nifi/conf directory. If not using a wildcard, you would need to trust the individual certificates on each host for the communication to be successful.
Make sure that all the servers you will be using to configure the cluster, can communicate with each other and that you add the necessary details in /etc/hosts so that the hostnames can be resolved to the correct IP.
Using your preferred text editor, edit the nifi.properties:
sudo nano /nifi/conf/nifi.properties
Find the # State Management # config and set nifi.state.management.embedded.zookeeper.start to “true“. This will enable the embedded zookeeper when NiFi is started/restarted on that host.
Find also the # cluster node properties (only configure for cluster nodes) # and set the following configs:
- nifi.cluster.is.node=true
- nifi.cluster.node.address=*put server hostname* (e.g. nifi01.mintopsblog.local)
- nifi.zookeeper.connect.string=*put all the zookeeper server hostnames that you will configure*
Edit the zookeeper.properties found in the conf directory to add all the servers which will have the embedded zookeeper service.
sudo nano /nifi/conf/zookeeper.properties
At the end of the file, you would need to add the server hostnames of all the servers which will have the embedded zookeeper service enabled.
Once that is all done, we would need to create a another directory under the state directory depending on what the dataDir property is set to.
Take note of the dataDir property inside zookeeper.properties.
cat /nifi/conf/zookeeper.properties | grep -i dataDir=
sudo mkdir /nifi/state/zookeeper
Depending on how you configured the servers inside zookeeper.properties, you would need to give an ID number to the specified hosts. If you had a host (e.g. nifi01) set in server.1, that host needs to have the ID 1. Do the following command on each host and give them their respective ID numbers.
echo 1 > /nifi/state/zookeeper/myid
Next, edit the state-management.xml file
sudo nano /nifi/conf/state-management.xml
Find <cluster-provider> property in the file and add all your zookeeper servers (e.g. nifi01.mintopsblog.local:2181) in the <property name=”Connect String”>
Start all the nifi applications at the same time so that the Cluster Election can start.
sudo ./nifi/bin/nifi.sh start
In the nifi logs you will first notice that zookeeper is trying to find the cluster coordinator, if not found, it will assign a node to become the coordinator automatically. After that, an “election” will be held to see which node has the correct/latest flow so that it can become the primary node. Since this is a new cluster, all nodes should have a clean configuration and a primary node will be chosen after the timer runs out.
When the election is done, the nifi cluster should be up and running. Login to any NiFi host URL and you should see the following on the top left:
You can also check the cluster status from the following:
Any changes done on the NiFi flows will be automatically replicated/synced on the other nodes.
Using an already configured Zookeeper quorum
When having a Zookeeper quorum on your cluster, you can use this to configure the NiFi cluster. This actually makes it easier since it will only require the following settings:
Edit the nifi.properties file:
sudo nano /nifi/conf/nifi.properties
Find the # cluster node properties (only configure for cluster nodes) # and set the following configs:
- nifi.cluster.is.node=true
- nifi.cluster.node.address=*put server hostname* (e.g. nifi01.mintopsblog.local)
- nifi.zookeeper.connect.string=*put all the zookeeper server hostnames*
Next, edit the state-management.xml file.
sudo nano /nifi/conf/state-management.xml
Find <cluster-provider> property in the file and add all your zookeeper servers (e.g. nifi01.mintopsblog.local:2181) in the <property name=”Connect String”>
Start the nodes by running the following:
cd /nifi/conf
sudo ./nifi.sh start
And that’s all! The rest will be taken care of by the NiFi starting process. It will create the necessary directory in the zookeeper quorum and then automatically choose which node will become the cluster coordinator and which will become the primary node.
Hope this guide helps you out, if you have any difficulties don’t hesitate to post a comment. Also, any needed improvements or mistakes done in the guides feel free to point them out.