PrestoDB – Basic Installation with HTTPS/SSL Configuration

PrestoDB is an open source tool that was developed by Facebook and provides a way of connecting multiple workers to use their resources for querying various services (e.g. Hive, Kafka, etc…) while distributing/parallizing the load.

More info can be found at PrestoDB.

This guide will show you how you can start with a small PrestoDB environment (it can also be a one node server) which can be used for testing purposes. You can then re-use this guide to productionize a larger cluster once you have a good grasp of what each service and configuration provides.


Pre-Requisites

You would need some pre-requisites before starting

  • PrestoDB
  • Machine Specifications(if you’re installing everything on a single node use below)
    • CPU: 4
    • RAM: 8GB
  • Operating System:
    • Linux (CentOS, Redhat, etc…)
  • JAVA:
    • Version 1.8

UNIX Service configuration

Will be using Centos for this guide. Let’s start by installing the necessary services that will be required by PrestoDB

  • SSH to your UNIX box and do the following:
  • First change the hostname to any desired name you would like
    sudo nano /etc/hostname
  • Add the internal IP and hostname in the /etc/hosts file using your preferred text editor. (e.g. as below). If you have a DNS server, add the necessary A records as well for better internal network communication via hostnames.
    sudo nano /etc/hosts
    172.19.30.1 presto.mintopsblog.local
  • Disable SELINUX and firewall
    setenforce 0
    sudo nano /etc/selinux/config
    SELINUX=disabled
    systemctl disable firewalld
    service firewalld stop
  • Install the following services:
    sudo yum -y install wget
    sudo yum -y install unzip
    sudo yum -y install ntp
    sudo yum -y install java-1.8.0-openjdk
  • Update all the packages afterwards (optional)
    sudo yum -y update
  • Start and configure the NTP service
    sudo systemctl enable ntpd
    sudo systemctl start ntpd
    timedatectl set-timezone UTC
  • Reboot node to take the necessary configurations. (mainly changing of hostname)
    sudo reboot

PrestoDB Coordinator Basic Installation & Configuraton

Now that we have the UNIX node configured with the necessary services, we can start with the PrestoDB installation.

  • <First up, get the latest PrestoDB TAR package from their website – PrestoDB Tarball
    sudo wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.198/presto-server-0.198.tar.gz
  • Extract the tarball and move it in any preferred directory
    sudo tar xvf presto-server-0.198.tar.gz
    sudo mkdir -p /opt/presto
    sudo mv presto-server-0.198 /opt/presto/
  • Ending up with the following structure

Presto1

  • Now we would need to create the necessary configuration files for PrestoDB. Do the following:
    echo -e "presto soft nofile 64000" >> /etc/security/limits.conf
    echo -e "presto hard nofile 64000" >> /etc/security/limits.conf
    mkdir -p /opt/oracle/presto-server-0.198/etc
    touch /opt/oracle/presto-server-0.198/config.properties
    touch /opt/oracle/presto-server-0.198/etc/jvm.config
    touch /opt/oracle/presto-server-0.198/etc/node.properties
    touch /opt/oracle/presto-server-0.198/etc/log.properties
  • After doing the previous commands you should end up with the following file structure

Presto2

  • We can start adding the necessary configs in each respective file, starting with the config.properties. (create what is necessary for you)
    • Single Node configuration (to be used as both coordinator and worker node)
      cat > /opt/presto/presto-server-0.198/etc/config.properties << EOF
      coordinator=true
      node-scheduler.include-coordinator=true
      http-server.https.enabled=true
      http-server.https.port=5665
      query.max-memory=5GB
      query.max-memory-per-node=1GB
      discovery-server.enabled=true
      discovery.uri=https://presto.mintopsblog.local:5665
      node.internal-address=presto.mintopsblog.local
      http-server.https.keystore.path=/opt/presto/ssl/presto.jks
      http-server.https.keystore.key=sslpassphrase
      internal-communication.https.required=true
      internal-communication.https.keystore.path=/opt/presto/ssl/presto.jks
      internal-communication.https.keystore.key=sslpassphrase
      http-server.https.secure-random-algorithm=SHA1PRNG
      EOF
    • Coordinator node configuration (to be used only as the PrestoDB coordinator)
      cat > /opt/presto/presto-server-0.198/etc/config.properties << EOF
      coordinator=true
      node-scheduler.include-coordinator=false
      http-server.https.enabled=true
      http-server.https.port=5665
      query.max-memory=80GB
      query.max-memory-per-node=8GB
      discovery-server.enabled=true
      discovery.uri=https://presto.mintopsblog.local:5665
      node.internal-address=presto.mintopsblog.local
      http-server.https.keystore.path=/opt/presto/ssl/presto.jks
      http-server.https.keystore.key=sslpassphrase
      internal-communication.https.required=true
      internal-communication.https.keystore.path=/opt/presto/ssl/presto.jks
      internal-communication.https.keystore.key=sslpassphrase
      http-server.https.secure-random-algorithm=SHA1PRNG
      EOF
    • Worker node configuration (to be used only as a PrestoDB worker)
      cat > /opt/presto/presto-server-0.198/etc/config.properties << EOF
      coordinator=false
      http-server.https.enabled=true
      http-server.https.port=5665
      query.max-memory=80GB
      query.max-memory-per-node=8GB
      discovery-server.enabled=true
      discovery.uri=https://presto.mintopsblog.local:5665
      node.internal-address=worker.mintopsblog.local
      http-server.https.keystore.path=/opt/presto/ssl/presto.jks
      http-server.https.keystore.key=sslpassphrase
      internal-communication.https.required=true
      internal-communication.https.keystore.path=/opt/presto/ssl/presto.jks
      internal-communication.https.keystore.key=sslpassphrase
      http-server.https.secure-random-algorithm=SHA1PRNG
      EOF
  • Let’s create the necessary SSL certificate in the same path specified in the config.properties
    sudo mkdir -p /opt/presto/ssl
    cd /opt/presto/ssl
    
    openssl req -newkey rsa:2048 -nodes -keyout privatekey.key -x509 -days 365 -out certificate.crt -passin pass:sslpassphrase -subj "/OU=presto/CN=presto.mintopsblog.local/"
    openssl pkcs12 -inkey privatekey.key -in certificate.crt -export -out bundle.p12 -passin pass:sslpassphrase -passout pass:sslpassphrase
    keytool -noprompt -importkeystore -srckeystore bundle.p12 -srcstoretype pkcs12 -srcstorepass sslpassphrase -destkeystore presto.jks -deststoretype JKS -deststorepass sslpassphrase
  • Configure the jvm.config next (give as much JAVA heap space you might actually need):
    cat > /opt/presto/presto-server-0.198/etc/jvm.config << EOF
    -server
    -Xmx4G
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+UseGCOverheadLimit
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:+ExitOnOutOfMemoryError
    EOF
  • Configure the log.properties with the following:
    echo -e "com.facebook.presto=INFO" > /opt/presto/presto-server-0.198/etc/log.properties
  • And lastly the node.properties config (you can generate a UUID by doing the command ‘uuidgen‘):
    cat > /opt/presto/presto-server-0.198/etc/node.properties << EOF
    node.environment=mintopsblog
    node.id=5a511258-3e86-4496-a2bb-21c8916871e8
    node.data-dir=/opt/presto/presto-data
    EOF
  • All the necessary configs are now configured but you would still need to do other small configs to connect PrestoDB to a service (e.g. Hive, Kafka, etc…)
  • To do this, you would need to create another directory under the ‘etc‘ directory of Presto, called ‘catalog‘. In this directory, you will have all the necessary service configuration for PrestoDB to query these services. (make sure that there is network connectivity between Presto server and the service you would like Presto to query from)
    mkdir -p /opt/presto/presto-server-0.198/etc/catalog
  • Presto/Kafka Connector configuration

    cat > /opt/presto/presto-server-0.198/etc/catalog/kafka.properties << EOF
    connector.name=kafka
    kafka.nodes=kafka.mintopsblog.local:9092
    kafka.table-names=mintopsblogtest
    kafka.hide-internal-columns=false
    EOF
  • You can also setup a JSON file with all the topic attributes so that Presto can be able to separate each attribute by column (e.g)
    mkdir -p /opt/presto/presto-server-0.198/etc/kafka
    touch /opt/presto/presto-server-0.198/etc/kafka/mintopsblogtest.json
    {
     "tableName": "mintopsblogtest",
     "schemaName": "default",
     "topicName": "mintopsblogtest",
     "key": {
     "dataFormat": "raw",
     "fields": [
     {
     "name": "kafka_key",
     "dataFormat": "LONG",
     "type": "BIGINT",
     "hidden": "false"
     }
     ]
     },
     "message": {
     "dataFormat": "json",
     "fields": [
     {
     "name": "firstname",
     "mapping": "firstName",
     "type": "VARCHAR"
     },
    
     {
     "name": "lastname",
     "mapping": "lastName",
     "type": "VARCHAR"
     },
    
     {
     "name": "timestamp",
     "mapping": "timestamp",
     "type": "TIMESTAMP",
     "dataFormat": "iso8601"
     }
     ]
     }
    }
  • Presto/Hive Connector configuration

    cat > /opt/presto/presto-server-0.198/etc/catalog/hive.properties << EOF
    connector.name=hive-hadoop2
    hive.metastore.uri=thift://hive.mintopsblog.local:9093
    hive.config.resources=/opt/presto/hive/core-site.xml, /opt/presto/hive/hdfs-site.xml
    EOF
  • Presto Connectors Documentation

    • You can find more Presto connector documentation and configuration at the following URL: PrestoDB Connectors

  • After everything is configured, the PrestoDB Coordinator server can be launched.
    su presto /opt/presto/presto-server-0.198/bin/launcher start
  • You can then see the logs in the previously configured “presto-data” directory
    tail -f /opt/presto/presto-data/var/log/launcher.log
    tail -f /opt/presto/presto-data/var/log/server.log
  • URL: https://hostname:port

PrestoDB Client Connectivity

When the PrestoDB coordinator and workers are started and working correctly, you would then need to use the Presto client to start running some queries.

  • Download the PrestoDB client from the following URL: PrestoDB Client
    sudo wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.198/presto-cli-0.198-executable.jar
    mkdir -p /opt/presto/presto-client
    sudo mv presto-cli-0.198-executable.jar /opt/presto/presto-client/presto
    chmod +x /opt/presto/presto-client/presto
  • Then run the presto client with the necessary arguments (you can change the catalog depending on which service you would like to query)
    /opt/presto/presto-client/presto --server https://presto.mintopsblog.local:5665 --catalog kafka --schema default --truststore-path /opt/presto/ssl/presto.jks --truststore-password sslpassphrase
  • After connecting, you could then do a simple select query and see the query running on the coordinator
    select * from *topicname/tablename* limit 1;
  • More details in the following documentation: Kafka Connector

PrestoDB Script

Use the following script to install PrestoDB on a single machine


#!/bin/bash

###Prerequisites:
#CentOS/RedHat 7
#Java (>= 8)

### Prompt user before installation
read -p "Are you sure you want to install PrestoDB? " prompt
if [[ $prompt == "y" || $prompt == "Y" || $prompt == "yes" || $prompt == "Yes" ]]
then

HOSTNAME_FILE='/etc/hostname'
read -p "Please specify a hostname for this host: " HOSTNAME
echo "$HOSTNAME" > $HOSTNAME_FILE
echo -e "Hostname changed to: `cat $HOSTNAME_FILE`"

###Adding IP and Hostname to the /etc/hosts. Delete anything after the 2nd line inside /etc/hosts
IPADDR=$(ip addr | grep "inet" | grep -v 127.0.0.1 | grep -v inet6 | awk '{print $2}' | cut -c -13)
sed -i 3,50d /etc/hosts
echo -e "\n$IPADDR $HOSTNAME" >> /etc/hosts
sed -i '/$HOSTNAME/{s|/||}' /etc/hosts
sed -i '/$HOSTNAME/{s|/2||}' /etc/hosts
sed -i '/$HOSTNAME/{s|/24||}' /etc/hosts

###Install EPEL repository since RHEL does not provide this
EPEL_URL='https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm'
yum -y install $EPEL_URL &> /dev/null
echo -e "EPEL RPM Installed"

###Installing basic services
NEW_SERVICES='wget unzip ntp sudo'
echo -e "Installing following services: $NEW_SERVICES"
yum -y install $NEW_SERVICES &> /dev/null
echo -e "Installation completed"

###Remove Chrony so that it doesn't impact the NTP Service
RMV_SERVICES='chrony'
echo -e "Removing services: $RMV_SERVICES"
yum -y remove $RMV_SERVICES &> /dev/null
echo -e "Removed successfully"

###Disable ssl OS verification inside /etc/python/cert-verification.cfg
sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg

###Installing JAVA 1.8 and creating necessary symlinks
APP_PATH="/opt/oracle"
JDK_URL='http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz'
JDK_TAR="jdk-8u161-linux-x64.tar.gz"
JDK_SRC="$APP_PATH"
JDK_VERSION="jdk1.8.0_161"
JDK_SYMSRC="$JDK_SRC/$JDK_VERSION"
JDK_SYMDEST="$APP_PATH/java"
JDK_PROFILE="/etc/profile.d/java.sh"

echo -e "Installing following JAVA JDK: $JDK_VERSION"
wget --no-check-certificate -c --header "Cookie: oraclelicense=accept-securebackup-cookie" $JDK_URL &> /dev/null

echo -e "Extracting $JDK_TAR"
tar xvf $JDK_TAR &> /dev/null
rm -rf $JDK_TAR
mkdir -p $JDK_SRC
mv $JDK_VERSION $JDK_SRC
rm -rf $JDK_VERSION
chmod -R 755 $JDK_SRC
ln -s $JDK_SYMSRC $JDK_SYMDEST

###Setting the JAVA_HOME variable
cat > $JDK_PROFILE $JDK_PROFILE /dev/null

setenforce 0
sed -i '/SELINUX/{s/=.*/=/}' $SELINUX_PATH
sed -i "/SELINUX=/ s/$/${SELINUX}/" $SELINUX_PATH

###Starting the required services (mainly NTP and auditd)
echo -e "Enabling NTPD and setting the timezone to UTC"
systemctl enable ntpd
systemctl start ntpd
timedatectl set-timezone UTC

###Download PrestoDB Tarball
USER='presto'
PRESTO_PATH='/opt/presto'
PRESTO_DATA="$PRESTO_PATH/presto-data"
PRESTO_URL='https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.198/presto-server-0.198.tar.gz'
PRESTO_TGZ='presto-server-0.198.tar.gz'
PRESTO_VERSION='presto-server-0.198'
PRESTO_CLT_URL='https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.198/presto-cli-0.198-executable.jar'
PRESTO_CLT_JAR='presto-cli-0.198-executable.jar'
PRESTO_CLT_DIR='presto-client'

mkdir -p $PRESTO_PATH
mkdir -p $PRESTO_DATA
mkdir -p $PRESTO_PATH/$PRESTO_CLT_DIR

useradd $USER
wget $PRESTO_URL
tar -xvf $PRESTO_TGZ
rm -rf $PRESTO_TGZ
mv $PRESTO_VERSION $PRESTO_PATH
mkdir -p $PRESTO_PATH/$PRESTO_VERSION/etc/catalog
wget $PRESTO_CLT_URL
mv $PRESTO_CLT_JAR $PRESTO_PATH/$PRESTO_CLT_DIR/presto
chown -R $USER $PRESTO_PATH
chmod -R 755 $PRESTO_PATH
chmod +x $PRESTO_PATH/$PRESTO_CLT_DIR/presto

#######SCRIPT WARNING#######
echo "This script will create both keystore and truststore for the current host."
echo "Please be sure to check the parameters inside the script before proceeding."

### Create SSL self-signed certificate
CERTIFICATE='certificate.crt'
PRIVATEKEY='privatekey.key'
PRIVATEKEYPASS='sslpassphrase'
P12='bundle.p12'
KEYSTORE="presto.jks"
SSLPATH="$PRESTO_PATH/ssl"
VALIDITY='365'

#######Self-signed certificate#######
sudo mkdir -p $SSLPATH
cd $SSLPATH

openssl req -newkey rsa:2048 -nodes -keyout $PRIVATEKEY -x509 -days $VALIDITY -out $CERTIFICATE -passin pass:$PRIVATEKEYPASS -subj "/OU=$USER/CN=$HOSTNAME/"
openssl pkcs12 -inkey $PRIVATEKEY -in $CERTIFICATE -export -out $P12 -passin pass:$PRIVATEKEYPASS -passout pass:$PRIVATEKEYPASS
keytool -noprompt -importkeystore -srckeystore $P12 -srcstoretype pkcs12 -srcstorepass $PRIVATEKEYPASS -destkeystore $KEYSTORE -deststoretype JKS -deststorepass $PRIVATEKEYPASS

# Trusting CA Certificate
cp $CERTIFICATE /etc/pki/ca-trust/source/anchors/
update-ca-trust extract

### Increase file limits ###
echo -e "$USER soft nofile 64000" >> /etc/security/limits.conf
echo -e "$USER hard nofile 64000" >> /etc/security/limits.conf

### Presto - Config Properties ###
PRESTO_CONFIG_PROPERTIES="$PRESTO_PATH/$PRESTO_VERSION/etc/config.properties"
PRESTO_JVM_CONFIG="$PRESTO_PATH/$PRESTO_VERSION/etc/jvm.config"
PRESTO_NODE_PROPERTIES="$PRESTO_PATH/$PRESTO_VERSION/etc/node.properties"
PRESTO_LOG_PROPERTIES="$PRESTO_PATH/$PRESTO_VERSION/etc/log.properties"
PRESTO_KAFKA_PROPERTIES="$PRESTO_PATH/$PRESTO_VERSION/etc/catalog/kafka.properties"
PRESTO_HIVE_PROPERTIES="$PRESTO_PATH/$PRESTO_VERSION/etc/catalog/hive.properties"
PRESTO_HTTPS='true'
PRESTO_PORT='5665'
PRESTO_COORDINATOR_WORKERNODE='true'
PRESTO_COORDINATOR_STATE='true'
PRESTO_DISCOVERY='true'
PRESTO_MAX_MEMORY='8GB'
PRESTO_MAX_MEMORY_PER_NODE='1GB'
echo -e "coordinator=${PRESTO_COORDINATOR_STATE}" > $PRESTO_CONFIG_PROPERTIES
echo -e "node-scheduler.include-coordinator=${PRESTO_COORDINATOR_WORKERNODE}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "http-server.https.enabled=${PRESTO_HTTPS}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "http-server.https.port=${PRESTO_PORT}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "query.max-memory=${PRESTO_MAX_MEMORY}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "query.max-memory-per-node=${PRESTO_MAX_MEMORY_PER_NODE}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "discovery-server.enabled=${PRESTO_DISCOVERY}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "discovery.uri=https://${HOSTNAME}:${PRESTO_PORT}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "node.internal-address=${HOSTNAME}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "http-server.https.keystore.path=${SSLPATH}/${KEYSTORE}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "http-server.https.keystore.key=${PRIVATEKEYPASS}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "internal-communication.https.required=${PRESTO_HTTPS}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "internal-communication.https.keystore.path=${SSLPATH}/${KEYSTORE}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "internal-communication.https.keystore.key=${PRIVATEKEYPASS}" >> $PRESTO_CONFIG_PROPERTIES
echo -e "http-server.https.secure-random-algorithm=SHA1PRNG" >> $PRESTO_CONFIG_PROPERTIES

### Presto - JVM Config ###
PRESTO_JAVAXMX='-Xmx16G'
echo -e "-server" > $PRESTO_JVM_CONFIG
echo -e "${PRESTO_JAVAXMX}" >> $PRESTO_JVM_CONFIG
echo -e "-XX:+UseG1GC" >> $PRESTO_JVM_CONFIG
echo -e "-XX:G1HeapRegionSize=32M" >> $PRESTO_JVM_CONFIG
echo -e "-XX:+UseGCOverheadLimit" >> $PRESTO_JVM_CONFIG
echo -e "-XX:+ExplicitGCInvokesConcurrent" >> $PRESTO_JVM_CONFIG
echo -e "-XX:+HeapDumpOnOutOfMemoryError" >> $PRESTO_JVM_CONFIG
echo -e "-XX:+ExitOnOutOfMemoryError" >> $PRESTO_JVM_CONFIG

### Presto - Node Properties ###
PRESTO_NODE_UUID='/tmp/uuid'
PRESTO_ENVIRONMENT='development'
uuidgen > $PRESTO_NODE_UUID
echo -e "node.environment=${PRESTO_ENVIRONMENT}" > $PRESTO_NODE_PROPERTIES
echo -e "node.id=`cat ${PRESTO_NODE_UUID}`" >> $PRESTO_NODE_PROPERTIES
echo -e "node.data-dir=${PRESTO_DATA}" >> $PRESTO_NODE_PROPERTIES

### Presto - Log Properties ###
echo -e "com.facebook.presto=INFO" > $PRESTO_LOG_PROPERTIES

### Presto - Kafka Properties ###
touch $PRESTO_KAFKA_PROPERTIES
read -p "Please specify the kafka hosts (e.g hostname:port): " KAFKA_HOSTS
read -p "Please specify the kafka topics (comma separated): " KAFKA_TOPICS
echo -e "connector.name=kafka" > $PRESTO_KAFKA_PROPERTIES
echo -e "kafka.nodes=${KAFKA_HOSTS}" >> $PRESTO_KAFKA_PROPERTIES
echo -e "kafka.table-names=${KAFKA_TOPICS}" >> $PRESTO_KAFKA_PROPERTIES
echo -e "kafka.hide-internal-columns=false" >> $PRESTO_KAFKA_PROPERTIES

### Presto - Hive Properties ###
mkdir -p $PRESTO_PATH//hive
touch $PRESTO_HIVE_PROPERTIES
read -p "Please specify the hive hosts (e.g thrift://hostname:port): " HIVE_HOSTS
HIVE_CONFIG="${PRESTO_PATH}/hive/core-site.xml,${PRESTO_PATH}/hive/hdfs-site.xml"
echo -e "connector.name=hive-hadoop2" > $PRESTO_HIVE_PROPERTIES
echo -e "hive.metastore.uri=${HIVE_HOSTS}" >> $PRESTO_HIVE_PROPERTIES
echo -e "hive.config.resources=${HIVE_CONFIG}" >> $PRESTO_HIVE_PROPERTIES

clear

su $USER $PRESTO_PATH/$PRESTO_VERSION/bin/launcher start

PRESTOURL='https://'`cat /etc/hostname`
WEBPORT='9000'
echo "PrestoDB setup is done, please use the following to try it out:"
echo "Coordinator URL: $PRESTOURL:$PRESTO_PORT"
echo "Using the client: $PRESTO_PATH/$PRESTO_CLT_DIR/presto --server $PRESTOURL:$PRESTO_PORT --catalog kafka --schema default --truststore-path $SSLPATH/$KEYSTORE --truststore-password $PRIVATEKEYPASS"

else
exit 0
fi


PrestoDB Custom Docker Image


FROM centos:latest
LABEL maintainer="MintOpsBlog"
LABEL environment="Development/QA"

### GENERAL VARIABLES ###

ENV USER 'presto'
ENV APPLICATION 'prestodb'
ENV HOSTNAME "${APPLICATION}.mintopsblog.local"
ENV PRESTO_PATH '/opt/presto'
ENV JAVA_SRC '/opt/oracle'

### JAVA VARIABLES ###

ENV JAVA_SYMLINK "${JAVA_SRC}/java"
ENV JAVA_URL 'http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz'
ENV JAVA_JDK 'jdk-8u161-linux-x64.tar.gz'
ENV JAVA_VERSION 'jdk1.8.0_161'
ENV JDK_PROFILE '/etc/profile.d/javahome.sh'

### SSL VARIABLES ###

ENV CERTIFICATE 'certificate.crt'
ENV PRIVATEKEY 'privatekey.key'
ENV PRIVATEKEYPASS 'sslpassphrase'
ENV P12 'bundle.p12'
ENV KEYSTORE "${APPLICATION}.jks"
ENV SSLPATH "${PRESTO_PATH}/ssl"
ENV VALIDITY '365'

### PRESTO SERVER VARIABLES ###

ENV PRESTO_URL 'https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.198/presto-server-0.198.tar.gz'
ENV PRESTO_TGZ 'presto-server-0.198.tar.gz'
ENV PRESTO_VERSION 'presto-server-0.198'
ENV PRESTO_HTTPS 'true'
ENV PRESTO_PORT '5665'
ENV PRESTO_COORDINATOR_WORKERNODE 'true'
ENV PRESTO_COORDINATOR_STATE 'true'
ENV PRESTO_DISCOVERY 'true'
ENV PRESTO_MAX_MEMORY '8GB'
ENV PRESTO_MAX_MEMORY_PER_NODE '1GB'
ENV PRESTO_JAVAXMX '-Xmx2G'
ENV PRESTO_ENVIRONMENT 'development'
ENV PRESTO_DATA "${PRESTO_PATH}/${APPLICATION}/presto-data"
ENV PRESTO_NODE_UUID '/tmp/uuid'

### PRESTO CLIENT VARIABLES ###

ENV PRESTO_CLT_URL 'https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.198/presto-cli-0.198-executable.jar'
ENV PRESTO_CLT_JAR 'presto-cli-0.198-executable.jar'
ENV PRESTO_CLT_DIR 'presto-client'

### PRESTO CONFIGURATION VARIABLES ###

ENV PRESTO_CONFIG_PROPERTIES "${PRESTO_PATH}/${APPLICATION}/${PRESTO_VERSION}/etc/config.properties"
ENV PRESTO_JVM_CONFIG "${PRESTO_PATH}/${APPLICATION}/${PRESTO_VERSION}/etc/jvm.config"
ENV PRESTO_NODE_PROPERTIES "${PRESTO_PATH}/${APPLICATION}/${PRESTO_VERSION}/etc/node.properties"
ENV PRESTO_LOG_PROPERTIES "${PRESTO_PATH}/${APPLICATION}/${PRESTO_VERSION}/etc/log.properties"
ENV PRESTO_KAFKA_PROPERTIES "${PRESTO_PATH}/${APPLICATION}/${PRESTO_VERSION}/etc/catalog/kafka.properties"
ENV PRESTO_HIVE_PROPERTIES "${PRESTO_PATH}/${APPLICATION}/${PRESTO_VERSION}/etc/catalog/hive.properties"

### HIVE VARIABLES ###

ENV HIVE_HOSTS "thrift://hostname:9083"
ENV HIVE_CONFIG "${PRESTO_PATH}/${APPLICATION}/hive/core-site.xml,${PRESTO_PATH}/${APPLICATION}/hive/hdfs-site.xml"

### KAFKA VARIABLES ###

ENV KAFKA_HOSTS "hostname:port"
ENV KAFKA_TOPICS "topicname"

RUN set -x \

&& yum -y install wget epel-release net-tools openssl \
### Presto Tarball ###
&& wget ${PRESTO_URL} \
&& tar -xzf ${PRESTO_TGZ} \
&& rm -rf ${PRESTO_TGZ} \
&& mkdir -p ${PRESTO_PATH}/${APPLICATION} \
&& mkdir -p ${PRESTO_DATA} \
&& mkdir -p ${PRESTO_PATH}/${APPLICATION}/hive \
&& mv ${PRESTO_VERSION} ${PRESTO_PATH}/${APPLICATION} \
&& mkdir -p ${PRESTO_PATH}/${APPLICATION}/${PRESTO_VERSION}/etc/catalog \
&& wget ${PRESTO_CLT_URL} \
&& mkdir -p ${PRESTO_PATH}/${APPLICATION}/${PRESTO_CLT_DIR} \
&& mv ${PRESTO_CLT_JAR} ${PRESTO_PATH}/${APPLICATION}/${PRESTO_CLT_DIR}/presto \
&& chown -R ${USER} ${PRESTO_PATH}/${APPLICATION} \
&& chmod -R 755 ${PRESTO_PATH}/${APPLICATION} \
&& chmod -x ${PRESTO_PATH}/${APPLICATION}/${PRESTO_CLT_DIR}/presto \

### Setting JAVA_HOME ###
&& mkdir -p ${JAVA_SRC} \
&& cd ${JAVA_SRC} \
&& wget --no-check-certificate -c --header "Cookie: oraclelicense=accept-securebackup-cookie" ${JAVA_URL}\
&& tar xvf ${JAVA_JDK} \
&& rm -rf ${JAVA_JDK} \
&& ln -s ${JAVA_SRC}/${JAVA_VERSION} ${JAVA_SYMLINK} \
&& echo "export JAVA_HOME=${JAVA_SYMLINK}" > ${JDK_PROFILE} \
&& echo 'PATH=$JAVA_HOME/bin:$PATH' >> ${JDK_PROFILE} \
&& source ${JDK_PROFILE} \

### Creating Self-Signed Certificate ###
&& mkdir -p ${SSLPATH} \
&& cd ${SSLPATH} \
&& openssl req -newkey rsa:2048 -nodes -keyout ${PRIVATEKEY} -x509 -days ${VALIDITY} -out ${CERTIFICATE} -passin pass:${PRIVATEKEYPASS} -subj "/OU=${APPLICATION}/CN=${HOSTNAME}/" \
&& openssl pkcs12 -inkey ${PRIVATEKEY} -in ${CERTIFICATE} -export -out ${P12} -passin pass:${PRIVATEKEYPASS} -passout pass:${PRIVATEKEYPASS} \
&& keytool -noprompt -importkeystore -srckeystore ${P12} -srcstoretype pkcs12 -srcstorepass ${PRIVATEKEYPASS} -destkeystore ${KEYSTORE} -deststoretype JKS -deststorepass ${PRIVATEKEYPASS} \

### Presto - Config Properties ###
&& echo -e "coordinator=${PRESTO_COORDINATOR_STATE}" > ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "node-scheduler.include-coordinator=${PRESTO_COORDINATOR_WORKERNODE}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "http-server.https.enabled=${PRESTO_HTTPS}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "http-server.https.port=${PRESTO_PORT}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "query.max-memory=${PRESTO_MAX_MEMORY}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "query.max-memory-per-node=${PRESTO_MAX_MEMORY_PER_NODE}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "discovery-server.enabled=${PRESTO_DISCOVERY}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "discovery.uri=https://${HOSTNAME}:${PRESTO_PORT}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "node.internal-address=${HOSTNAME}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "http-server.https.keystore.path=${SSLPATH}/${KEYSTORE}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "http-server.https.keystore.key=${PRIVATEKEYPASS}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "internal-communication.https.required=${PRESTO_HTTPS}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "internal-communication.https.keystore.path=${SSLPATH}/${KEYSTORE}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "internal-communication.https.keystore.key=${PRIVATEKEYPASS}" >> ${PRESTO_CONFIG_PROPERTIES} \
&& echo -e "http-server.https.secure-random-algorithm=SHA1PRNG" >> ${PRESTO_CONFIG_PROPERTIES} \

### Presto - JVM Config ###
&& echo -e "-server" > ${PRESTO_JVM_CONFIG} \
&& echo -e "${PRESTO_JAVAXMX}" >> ${PRESTO_JVM_CONFIG} \
&& echo -e "-XX:+UseG1GC" >> ${PRESTO_JVM_CONFIG} \
&& echo -e "-XX:G1HeapRegionSize=32M" >> ${PRESTO_JVM_CONFIG} \
&& echo -e "-XX:+UseGCOverheadLimit" >> ${PRESTO_JVM_CONFIG} \
&& echo -e "-XX:+ExplicitGCInvokesConcurrent" >> ${PRESTO_JVM_CONFIG} \
&& echo -e "-XX:+HeapDumpOnOutOfMemoryError" >> ${PRESTO_JVM_CONFIG} \
&& echo -e "-XX:+ExitOnOutOfMemoryError" >> ${PRESTO_JVM_CONFIG} \

### Presto - Node Properties ###
&& uuidgen > ${PRESTO_NODE_UUID} \
&& echo -e "node.environment=${PRESTO_ENVIRONMENT}" > ${PRESTO_NODE_PROPERTIES} \
&& echo -e "node.id=`cat ${PRESTO_NODE_UUID}`" >> ${PRESTO_NODE_PROPERTIES} \
&& echo -e "node.data-dir=${PRESTO_DATA}" >> ${PRESTO_NODE_PROPERTIES} \

### Presto - Log Properties ###
&& echo -e "com.facebook.presto=INFO" > ${PRESTO_LOG_PROPERTIES} \

### Presto - Kafka Properties ###
&& touch ${PRESTO_KAFKA_PROPERTIES} \
&& echo -e "connector.name=kafka" > ${PRESTO_KAFKA_PROPERTIES} \
&& echo -e "kafka.nodes=${KAFKA_HOSTS}" >> ${PRESTO_KAFKA_PROPERTIES} \
&& echo -e "kafka.table-names=${KAFKA_TOPICS}" >> ${PRESTO_KAFKA_PROPERTIES} \
&& echo -e "kafka.hide-internal-columns=false" >> ${PRESTO_KAFKA_PROPERTIES} \

### Presto - Hive Properties ###
&& touch ${PRESTO_HIVE_PROPERTIES} \
&& echo -e "connector.name=hive-hadoop2" > ${PRESTO_HIVE_PROPERTIES} \
&& echo -e "hive.metastore.uri=${HIVE_HOSTS}" >> ${PRESTO_HIVE_PROPERTIES} \
&& echo -e "hive.config.resources=${HIVE_CONFIG}" >> ${PRESTO_HIVE_PROPERTIES}

ADD presto-configs ${PRESTO_PATH}/${APPLICATION}/hive
EXPOSE ${PRESTO_PORT}
USER ${USER}
CMD ["/opt/presto/presto-server-0.198/bin/launcher", "run"]


Hope this guide helps you out, if you have any difficulties don’t hesitate to post a comment. Also any needed improvements or mistakes done in the guides feel free to point them out.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

Advertisements
Advertisements
%d bloggers like this: