PrestoDB – Basic Installation with HTTPS/SSL Configuration

PrestoDB is an open source tool that was developed by Facebook and provides a way of connecting multiple workers to use their resources for querying various services (e.g. Hive, Kafka, etc…) while distributing/parallizing the load.

More info can be found at PrestoDB.

This guide will show you how you can start with a small PrestoDB environment (it can also be a one node server) which can be used for testing purposes. You can then re-use this guide to productionize a larger cluster once you have a good grasp of what each service and configuration provides.


Pre-Requisites

You would need some pre-requisites before starting

  • PrestoDB
  • Machine Specifications(if you’re installing everything on a single node use below)
    • CPU: 4
    • RAM: 8GB
  • Operating System:
    • Linux (CentOS, Redhat, etc…)
  • JAVA:
    • Version 1.8

UNIX Service configuration

Will be using Centos for this guide. Let’s start by installing the necessary services that will be required by PrestoDB

  • SSH to your UNIX box and do the following:
  • First change the hostname to any desired name you would like
    sudo nano /etc/hostname
  • Add the internal IP and hostname in the /etc/hosts file using your preferred text editor. (e.g. as below). If you have a DNS server, add the necessary A records as well for better internal network communication via hostnames.
    sudo nano /etc/hosts
    172.19.30.1 presto.mintopsblog.local
  • Disable SELINUX and firewall
    setenforce 0
    sudo nano /etc/selinux/config
    SELINUX=disabled
    systemctl disable firewalld
    service firewalld stop
  • Install the following services:
    sudo yum -y install wget
    sudo yum -y install unzip
    sudo yum -y install ntp
    sudo yum -y install java-1.8.0-openjdk
  • Update all the packages afterwards (optional)
    sudo yum -y update
  • Start and configure the NTP service
    sudo systemctl enable ntpd
    sudo systemctl start ntpd
    timedatectl set-timezone UTC
  • Reboot node to take the necessary configurations. (mainly changing of hostname)
    sudo reboot

PrestoDB Coordinator Basic Installation & Configuraton

Now that we have the UNIX node configured with the necessary services, we can start with the PrestoDB installation.

  • First up, get the latest PrestoDB TAR package from their website – PrestoDB Tarball
    sudo wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.198/presto-server-0.198.tar.gz
  • Extract the tarball and move it in any preferred directory
    sudo tar xvf presto-server-0.198.tar.gz
    sudo mkdir -p /opt/presto
    sudo mv presto-server-0.198 /opt/presto/
  • Ending up with the following structure

Presto1

  • Now we would need to create the necessary configuration files for PrestoDB. Do the following:
    echo -e "presto soft nofile 64000" >> /etc/security/limits.conf
    echo -e "presto hard nofile 64000" >> /etc/security/limits.conf
    mkdir -p /opt/oracle/presto-server-0.198/etc
    touch /opt/oracle/presto-server-0.198/config.properties
    touch /opt/oracle/presto-server-0.198/etc/jvm.config
    touch /opt/oracle/presto-server-0.198/etc/node.properties
    touch /opt/oracle/presto-server-0.198/etc/log.properties
  • After doing the previous commands you should end up with the following file structure

Presto2

  • We can start adding the necessary configs in each respective file, starting with the config.properties. (create what is necessary for you)
    • Single Node configuration (to be used as both coordinator and worker node)
      cat > /opt/presto/presto-server-0.198/etc/config.properties << EOF
      coordinator=true
      node-scheduler.include-coordinator=true
      http-server.https.enabled=true
      http-server.https.port=5665
      query.max-memory=5GB
      query.max-memory-per-node=1GB
      discovery-server.enabled=true
      discovery.uri=https://presto.mintopsblog.local:5665
      node.internal-address=presto.mintopsblog.local
      http-server.https.keystore.path=/opt/presto/ssl/presto.jks
      http-server.https.keystore.key=sslpassphrase
      internal-communication.https.required=true
      internal-communication.https.keystore.path=/opt/presto/ssl/presto.jks
      internal-communication.https.keystore.key=sslpassphrase
      http-server.https.secure-random-algorithm=SHA1PRNG
      EOF
    • Coordinator node configuration (to be used only as the PrestoDB coordinator)
      cat > /opt/presto/presto-server-0.198/etc/config.properties << EOF
      coordinator=true
      node-scheduler.include-coordinator=false
      http-server.https.enabled=true
      http-server.https.port=5665
      query.max-memory=80GB
      query.max-memory-per-node=8GB
      discovery-server.enabled=true
      discovery.uri=https://presto.mintopsblog.local:5665
      node.internal-address=presto.mintopsblog.local
      http-server.https.keystore.path=/opt/presto/ssl/presto.jks
      http-server.https.keystore.key=sslpassphrase
      internal-communication.https.required=true
      internal-communication.https.keystore.path=/opt/presto/ssl/presto.jks
      internal-communication.https.keystore.key=sslpassphrase
      http-server.https.secure-random-algorithm=SHA1PRNG
      EOF
    • Worker node configuration (to be used only as a PrestoDB worker)
      cat > /opt/presto/presto-server-0.198/etc/config.properties << EOF
      coordinator=false
      http-server.https.enabled=true
      http-server.https.port=5665
      query.max-memory=80GB
      query.max-memory-per-node=8GB
      discovery-server.enabled=true
      discovery.uri=https://presto.mintopsblog.local:5665
      node.internal-address=worker.mintopsblog.local
      http-server.https.keystore.path=/opt/presto/ssl/presto.jks
      http-server.https.keystore.key=sslpassphrase
      internal-communication.https.required=true
      internal-communication.https.keystore.path=/opt/presto/ssl/presto.jks
      internal-communication.https.keystore.key=sslpassphrase
      http-server.https.secure-random-algorithm=SHA1PRNG
      EOF
  • Let’s create the necessary SSL certificate in the same path specified in the config.properties
    sudo mkdir -p /opt/presto/ssl
    cd /opt/presto/ssl
    
    openssl req -newkey rsa:2048 -nodes -keyout privatekey.key -x509 -days 365 -out certificate.crt -passin pass:sslpassphrase -subj "/OU=presto/CN=presto.mintopsblog.local/"
    openssl pkcs12 -inkey privatekey.key -in certificate.crt -export -out bundle.p12 -passin pass:sslpassphrase -passout pass:sslpassphrase
    keytool -noprompt -importkeystore -srckeystore bundle.p12 -srcstoretype pkcs12 -srcstorepass sslpassphrase -destkeystore presto.jks -deststoretype JKS -deststorepass sslpassphrase
  • Configure the jvm.config next (give as much JAVA heap space you might actually need):
    cat > /opt/presto/presto-server-0.198/etc/jvm.config << EOF
    -server
    -Xmx4G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    EOF
  • Configure the log.properties with the following:
    echo -e "com.facebook.presto=INFO" > /opt/presto/presto-server-0.198/etc/log.properties
  • And lastly the node.properties config (you can generate a UUID by doing the command ‘uuidgen‘):
    cat > /opt/presto/presto-server-0.198/etc/node.properties << EOF
    node.environment=mintopsblog
    node.id=5a511258-3e86-4496-a2bb-21c8916871e8
    node.data-dir=/opt/presto/presto-data
    EOF
  • All the necessary configs are now configured but you would still need to do other small configs to connect PrestoDB to a service (e.g. Hive, Kafka, etc…)
  • To do this, you would need to create another directory under the ‘etc‘ directory of Presto, called ‘catalog‘. In this directory, you will have all the necessary service configuration for PrestoDB to query these services. (make sure that there is network connectivity between Presto server and the service you would like Presto to query from)
    mkdir -p /opt/presto/presto-server-0.198/etc/catalog
  • Presto/Kafka Connector configuration

    cat > /opt/presto/presto-server-0.198/etc/catalog/kafka.properties << EOF
    connector.name=kafka
    kafka.nodes=kafka.mintopsblog.local:9092
    kafka.table-names=mintopsblogtest
    kafka.hide-internal-columns=false
    EOF
  • You can also setup a JSON file with all the topic attributes so that Presto can be able to separate each attribute by column (e.g)
    mkdir -p /opt/presto/presto-server-0.198/etc/kafka
    touch /opt/presto/presto-server-0.198/etc/kafka/mintopsblogtest.json
    {
     "tableName": "mintopsblogtest",
     "schemaName": "default",
     "topicName": "mintopsblogtest",
     "key": {
     "dataFormat": "raw",
     "fields": [
     {
     "name": "kafka_key",
     "dataFormat": "LONG",
     "type": "BIGINT",
     "hidden": "false"
     }
     ]
     },
     "message": {
     "dataFormat": "json",
     "fields": [
     {
     "name": "firstname",
     "mapping": "firstName",
     "type": "VARCHAR"
     },
    
     {
     "name": "lastname",
     "mapping": "lastName",
     "type": "VARCHAR"
     },
    
     {
     "name": "timestamp",
     "mapping": "timestamp",
     "type": "TIMESTAMP",
     "dataFormat": "iso8601"
     }
     ]
     }
    }
  • Presto/Hive Connector configuration

    cat > /opt/presto/presto-server-0.198/etc/catalog/hive.properties << EOF
    connector.name=hive-hadoop2
    hive.metastore.uri=thift://hive.mintopsblog.local:9093
    hive.config.resources=/opt/presto/hive/core-site.xml, /opt/presto/hive/hdfs-site.xml
    EOF
  • Presto Connectors Documentation

    • You can find more Presto connector documentation and configuration at the following URL: PrestoDB Connectors

  • After everything is configured, the PrestoDB Coordinator server can be launched.
    su presto /opt/presto/presto-server-0.198/bin/launcher start
  • You can then see the logs in the previously configured “presto-data” directory
    tail -f /opt/presto/presto-data/var/log/launcher.log
    tail -f /opt/presto/presto-data/var/log/server.log
  • URL: https://hostname:port

PrestoDB Client Connectivity

When the PrestoDB coordinator and workers are started and working correctly, you would then need to use the Presto client to start running some queries.

  • Download the PrestoDB client from the following URL: PrestoDB Client
    sudo wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.198/presto-cli-0.198-executable.jar
    mkdir -p /opt/presto/presto-client
    sudo mv presto-cli-0.198-executable.jar /opt/presto/presto-client/presto
    chmod +x /opt/presto/presto-client/presto
  • Then run the presto client with the necessary arguments (you can change the catalog depending on which service you would like to query)
    /opt/presto/presto-client/presto --server https://presto.mintopsblog.local:5665 --catalog kafka --schema default --truststore-path /opt/presto/ssl/presto.jks --truststore-password sslpassphrase
  • After connecting, you could then do a simple select query and see the query running on the coordinator
    select * from *topicname/tablename* limit 1;
  • More details in the following documentation: Kafka Connector

Hope this guide helps you out, if you have any difficulties don’t hesitate to post a comment. Also any needed improvements or mistakes done in the guides feel free to point them out.

Advertisements

2 comments

  1. Hi Elton Atkins,
    I have some doubt in presto configuration. I am tried to integrate the presto cluster with AD/LDAP. In your blog , we see that you are using self signed certificate. I am able to do authentication on single node. I am not able to create presto multinode cluster running on HTTPS.
    I am using .cer file .

    Thanks,
    Datta

    Like

  2. Hi Datta, For Cluster HTTPS configuration, the best way of configuring it would be to use the JKS Keystores and adding your .cer file to the keystore itself. This keystore would then need to be the same on all the hosts (e.g. if you have prestodb01.test.com and prestodb02.test.com cer files, you would need to add both cer files to the keystore, transfer the keystore to both the hosts and point to the keystore in your config.properties file). This is needed so that both hosts can trust each other using the SSL protocol. Easiest way to test this would be, to create a self-signed wildcard certificate (e.g. *.test.com) and using the same keystore throughout your hosts without having too many .cer files.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: