With this option you create a virtual machine template from an existing CentOS 7 virtual machine, use Terraform from the jumpbox virtual machine to generate copies of the template which will comprise the Greenplum database cluster, and deploy a Greenplum Database cluster.

Creating the Virtual Machine Template

In this section, you clone a virtual machine from an existing CentOS 7 virtual machine, perform a series of configuration changes, and create a template from it. Finally, you verify that it was configured correctly by deploying a test virtual machine from the newly created template and checking its configuration.

Preparing the Virtual Machine

Create a template from an existing virtual machine. You must have a running CentOS 7 virtual machine in the datastore and cluster where you deploy the Greenplum environment.

  1. Log in to vCenter and navigate to Hosts and Clusters.
  2. Right click your existing CentOS 7 virtual machine.
  3. Select CloneClone to Virtual Machine.
  4. Enter greenplum-db-template-vm as the virtual machine name, then click Next.
  5. Select your cluster, then click Next.
  6. Select the vSAN datastore and select Keep existing VM storage policies for VM Storage Policy, then click Next.
  7. Under Select clone options, check the boxes Power on virtual machine after creation and Customize this virtual machine's hardware and click Next.
  8. Under Customize hardware, check the number of hard disks configured for this virtual machine. If there is only one, add a second one by clicking Add new deviceHard Disk.
  9. Edit the existing network adapter New Network so it connects to the gp-virtual-external port group.
    1. If you are using DHCP, a new IP address will be assigned to this interface. If you are using static IP assignment, you must manually set up the IP address in a later step.
  10. Review your configuration, then click Finish.
  11. Once the virtual machine is powered on, launch the Web Console and log in as root. Check the virtual machine IP address by running ip a. If you are using static IP assignment, you must manually set it up:
    1. Edit the file /etc/sysconfig/network-scripts/ifcfg-<interface-name>.

    2. Enter the network information provided by your network administrator for the gp-virtual-external network. For example:

      BOOTPROTO=none
      IPADDR=10.202.89.10
      NETMASK=255.255.255.0
      GATEWAY=10.202.89.1
      DNS1=1.0.0.1
      DNS2=1.1.1.1
      

Performing System Configuration

Configure the newly cloned virtual machine in order to support a Greenplum Database system.

  1. Log in to the cloned virtual machine greenplum-db-template-vm as user root.

  2. Verify that VMware Tools is installed. Refer to Installing VMware Tools for instructions.

  3. Disable the following services:

    1. Disable SELinux by editing the /etc/selinux/config file. Change the value of the SELINUX parameter in the configuration file as follows:

      SELINUX=disabled
      
    2. Check that the System Security Services Daemon (SSSD) is installed:

      $ yum list sssd | grep -i "Installed Packages"
      

      If the SSSD is installed, edit the SSSD configuration file and set the selinux_provider parameter to none to prevent SELinux related SSH authentication denials which could occur even if SELinux is disabled. Edit /etc/sssd/sssd.conf and add the following line. If SSSD is not installed, skip this step.

      selinux_provider=none
      
    3. Disable the Firewall service:

      $ systemctl stop firewalld
      $ systemctl disable firewalld
      $ systemctl mask --now firewalld
      
    4. Disable the Tuned daemon:

      $ systemctl stop tuned
      $ systemctl disable tuned
      $ systemctl mask --now tuned
      
    5. Disable Chrony:

      $ systemctl stop chronyd
      $ systemctl disable chronyd
      $ systemctl mask --now chronyd
      
  4. Back up the boot files:

    $ cp /etc/default/grub /etc/default/grub-backup
    $ cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg-backup
    
  5. Add the following boot parameters:

    1. Disable Transparent Huge Page (THP):

      $ grubby --update-kernel=ALL --args="transparent_hugepage=never"
      
    2. Add the parameter elevator=deadline:

      $ grubby --update-kernel=ALL --args="elevator=deadline"
      
  6. Install and enable the ntp daemon:

    $ yum install -y ntp
    $ systemctl enable ntpd
    
  7. Configure the NTP servers:

    1. Remove all unwanted servers from /etc/ntp.conf. For example:

      ...
      # Use public servers from the pool.ntp.org project.
      # Please consider joining the pool (http://www.pool.ntp.org/join.html).
      server 0.centos.pool.ntp.org iburst
      ...
      
    2. Add an entry for each server to /etc/ntp.conf:

      server <data center's NTP time server 1>
      server <data center's NTP time server 2>
      ...
      server <data center's NTP time server N>
      
    3. Add the master and standby to the list of servers after datacenter NTP servers in /etc/ntp.conf:

      server <data center's NTP time server N>
      ...
      server mdw
      server smdw
      
  8. Configure kernel settings so the system is optimized for Greenplum Database.

    1. Create the configuration file /etc/sysctl.d/10-gpdb.conf and paste the following kernel optimization parameters:

      kernel.msgmax = 65536
      kernel.msgmnb = 65536
      kernel.msgmni = 2048
      kernel.sem = 500 2048000 200 40960
      kernel.shmmni = 1024
      kernel.sysrq = 1
      net.core.netdev_max_backlog = 2000
      net.core.rmem_max = 4194304
      net.core.wmem_max = 4194304
      net.core.rmem_default = 4194304
      net.core.wmem_default = 4194304
      net.ipv4.tcp_rmem = 4096 4224000 16777216
      net.ipv4.tcp_wmem = 4096 4224000 16777216
      net.core.optmem_max = 4194304
      net.core.somaxconn = 10000
      net.ipv4.ip_forward = 0
      net.ipv4.tcp_congestion_control = cubic
      net.ipv4.tcp_tw_recycle = 0
      net.core.default_qdisc = fq_codel
      net.ipv4.tcp_mtu_probing = 0
      net.ipv4.conf.all.arp_filter = 1
      net.ipv4.conf.default.accept_source_route = 0
      net.ipv4.ip_local_port_range = 10000 65535
      net.ipv4.tcp_max_syn_backlog = 4096
      net.ipv4.tcp_syncookies = 1
      vm.overcommit_memory = 2
      vm.overcommit_ratio = 95
      vm.swappiness = 10
      vm.dirty_expire_centisecs = 500
      vm.dirty_writeback_centisecs = 100
      vm.zone_reclaim_mode = 0
      
    2. Add the following parameters, some of the values will depend on the virtual machine settings calculated on the Sizing section.

      1. Determine the value of the RAM in bytes by creating the variable $RAM_IN_BYTES. For example, for a 30GB RAM virtual machine, run the following:

        $ RAM_IN_BYTES=$((30 * 1024 * 1024 * 1024))
        
      2. Define the following parameters that depend on the variable $RAM_IN_BYTES that you just created, and append them to the file /etc/sysctl.d/10-gpdb.conf by running the following commands:

        $ echo "vm.min_free_kbytes = $(($RAM_IN_BYTES * 3 / 100 / 1024))" >> /etc/sysctl.d/10-gpdb.conf
        $ echo "kernel.shmall = $(($RAM_IN_BYTES / 2 / 4096))" >> /etc/sysctl.d/10-gpdb.conf
        $ echo "kernel.shmmax = $(($RAM_IN_BYTES / 2))" >> /etc/sysctl.d/10-gpdb.conf
        
      3. If your virtual machine RAM is less than or equal to 64 GB, run the following commands:

        $ echo "vm.dirty_background_ratio = 3" >> /etc/sysctl.d/10-gpdb.conf
        $ echo "vm.dirty_ratio = 10" >> /etc/sysctl.d/10-gpdb.conf
        
      4. If your virtual machine RAM is greater than 64 GB, run the following commands:

        $ echo "vm.dirty_background_ratio = 0" >> /etc/sysctl.d/10-gpdb.conf
        $ echo "vm.dirty_ratio = 0" >> /etc/sysctl.d/10-gpdb.conf
        $ echo "vm.dirty_background_bytes = 1610612736 # 1.5GB" >> /etc/sysctl.d/10-gpdb.conf
        $ echo "vm.dirty_bytes = 4294967296 # 4GB" >> /etc/sysctl.d/10-gpdb.conf
        
  9. Configure ssh to allow password-less login.

    1. Edit /etc/ssh/sshd_config file and update following options:

      PasswordAuthentication yes
      ChallengeResponseAuthentication yes
      UsePAM yes
      MaxStartups 100
      MaxSessions 100
      
    2. Create ssh keys to allow passwordless login with root by running the following commands:

      # make sure to generate ssh keys without password. Press Enter for defaults
      $ ssh-keygen
      $ chmod 700 /root/.ssh
      # copy public key to authorized_keys
      $ cd /root/.ssh/
      $ cat id_rsa.pub > authorized_keys
      $ chmod 600 authorized_keys
      # it will add host signature to known_hosts
      $ ssh-keyscan -t rsa localhost > known_hosts
      # duplicate host signature for all hosts in the cluster
      $ key=$(cat known_hosts)
      $ for i in mdw smdw $(seq -f "sdw%g" 1 64); do
          echo ${key}| sed -e "s/localhost/${i}/" >> known_hosts
        done
      $ chmod 644 known_hosts
      
  10. Configure the system resource limits to control the amount of resources used by Greenplum by creating the file /etc/security/limits.d/20-nproc.conf.

    1. Ensure that the directory exists before creating the file:

      $ mkdir -p /etc/security/limits.d
      
    2. Append the following contents to the end of /etc/security/limits.d/20-nproc.conf:

      * soft nofile 524288
      * hard nofile 524288
      * soft nproc 131072
      * hard nproc 131072
      
  11. Create the base mount point /gpdata for the virtual machine data drive:

    $ mkdir -p /gpdata
    $ mkfs.xfs /dev/sdb
    $ mount -t xfs -o rw,noatime,nodev,inode64 /dev/sdb /gpdata/
    $ df -kh
    $ echo /dev/sdb /gpdata/ xfs rw,nodev,noatime,inode64 0 0 >> /etc/fstab
    $ mkdir -p /gpdata/primary
    $ mkdir -p /gpdata/mirror
    $ mkdir -p /gpdata/master
    
  12. Configure the file /etc/rc.local to make the following settings persistent.

    1. Update the file content:

      # Configure readahead for the `/dev/sdb` to 16384 512-byte sectors, i.e. 8MiB
      /sbin/blockdev --setra 16384 /dev/sdb
      # Configure gp-virtual-internal network settings with MTU 9000
      /sbin/ip link set ens192 mtu 9000
      # Configure jumbo frame RX ring buffer to 4096
      /sbin/ethtool --set-ring ens192 rx-jumbo 4096
      
    2. Make the file executable:

      $ chmod +x /etc/rc.d/rc.local
      
  13. Create the group and user gpadmin:gpadmin required by the Greenplum Database.

    1. Execute the following steps in order to create the user gpadmin in the group gpadmin:

      $ groupadd gpadmin
      $ useradd -g gpadmin -m gpadmin
      $ passwd gpadmin
      # Enter the desired password at the prompt
      
    2. (Optional) Change the root password to a preferred password:

      $ passwd root
      # Enter the desired password at the prompt
      
    3. Create the file /home/gpadmin/.bashrc for gpadmin with the following content:

      ### .bashrc
      
      ### Source global definitions
      if [ -f /etc/bashrc ]; then
          . /etc/bashrc
      fi
      
      ### User specific aliases and functions
      
      ### If Greenplum has been installed, then add Greenplum-specific commands to the path
      if [ -f /usr/local/greenplum-db/greenplum_path.sh ]; then
          source /usr/local/greenplum-db/greenplum_path.sh
      fi
      
    4. Change the ownership of /home/gpadmin/.bashrc to gpadmin:gpadmin:

      $ chown gpadmin:gpadmin /home/gpadmin/.bashrc
      
    5. Change the ownership of the /gpdata directory to gpadmin:gpadmin:

      $ chown -R gpadmin:gpadmin /gpdata
      
    6. Create ssh keys for passwordless login as gpadmin user:

      $ su - gpadmin
      # make sure to generate ssh keys without password. Press Enter for defaults
      $ ssh-keygen
      $ chmod 700 /home/gpadmin/.ssh
      # copy public key to authorized_keys
      $ cd /home/gpadmin/.ssh/
      $ cat id_rsa.pub > authorized_keys
      $ chmod 600 authorized_keys
      # it will add host signature to known_hosts
      $ ssh-keyscan -t rsa localhost > known_hosts
      # duplicate host signature for all hosts in the cluster
      $ key=$(cat known_hosts)
      $ for i in mdw smdw $(seq -f "sdw%g" 1 64); do
          echo ${key}| sed -e "s/localhost/${i}/" >> known_hosts
        done
      $ chmod 644 known_hosts
      
    7. Log out of gpadmin to go back to root before you proceed to the next step.

  14. Configure cgroups for Greenplum.

    For security and resource management, Greenplum Database makes use of the Linux cgroups.

    1. Install the cgroup configuration package:

      $ yum install -y libcgroup-tools
      
    2. Verify that the directory /etc/cgconfig.d exists:

      $ mkdir -p /etc/cgconfig.d
      
    3. Create the cgroups configuration file /etc/cgconfig.d/10-gpdb.conf for Greenplum:

      group gpdb {
          perm {
              task {
                  uid = gpadmin;
                  gid = gpadmin;
              }
              admin {
                  uid = gpadmin;
                  gid = gpadmin;
              }
          }
          cpu {
          }
          cpuacct {
          }
          cpuset {
          }
          memory {
          }
      }
      
    4. Prepare the configuration file and enable cgconfig via systemctl:

      $ cgconfigparser -l /etc/cgconfig.d/10-gpdb.conf
      $ systemctl enable cgconfig.service
      
  15. Update the /etc/hosts file with all of the IP addresses and hostnames in the network gp-virtual-internal.

    1. Verify that you have following parameters defined:

      • Total number of segment virtual machines you wish to deploy, the default is 64.
      • The starting IP address of the master virtual machine in the gp-virtual-internal port group, the default is 250.
      • The leading octets for the gp-virtual-internal network IP range, the default is 192.168.1..
      • The segment IP will start from 192.168.1.2 and the master IP will start from 192.168.1.250
    2. Run the following commands, replacing the values based on your environment:

    $ echo '192.168.1.250 mdw' >> /etc/hosts
    $ echo '192.168.1.251 smdw' >> /etc/hosts
    $ for i in {1..64}; do
        echo  "192.168.1.$((i+1)) sdw${i}" >> /etc/hosts
      done
    
  16. Create two files hosts-all and hosts-segments under /home/gpadmin. Replace 64 with your number of segment virtual machines if applicable:

    $ echo mdw > /home/gpadmin/hosts-all
    $ echo smdw >> /home/gpadmin/hosts-all
    $ > /home/gpadmin/hosts-segments
    $ for i in {1..64}; do
        echo  "sdw${i}" >> /home/gpadmin/hosts-all
        echo  "sdw${i}" >> /home/gpadmin/hosts-segments
      done
    $ chown gpadmin:gpadmin /home/gpadmin/hosts*
    

Installing the Greenplum Database Software

  1. Download the latest version of the Greenplum Database Server 6 for RHEL 7 from VMware Tanzu Network.

  2. Move the downloaded binary in to the virtual machine and install Greenplum:

    $ scp greenplum-db-6.*.rpm root@greenplum-db-template-vm:/tmp
    $ ssh root@greenplum-db-template-vm
    $ yum install -y /tmp/greenplum-db-6.*.rpm
    
  3. Install the following yum packages for better supportability:

    • dstat to monitor system statistics, like network and I/O performance.
    • sos to generate an sosreport, a best practice to collect system information for support purposes.
    • tree to visualize folder structure.
    • wget to easily get artifacts from the Internet.
    $ yum install -y dstat
    $ yum install -y sos
    $ yum install -y tree
    $ yum install -y wget
    
  4. Power down the virtual machine:

    $ shutdown now
    
  5. Enable vApp options in vCenter:

    • Select the VM greenplum-db-template-vm
    • In the VM view, click on Configure tab at the top of the page
    • If vApp Option is disabled, then click EDIT...
      • click Enable vApp options
      • click OK
  6. Add vApp option guestinfo.segment_count:

    • Select SettingsvApp Options
    • Under Properties, click ADD
    • In the General tab, enter the following:
      • For Category, enter Greenplum
      • For Label, enter Number of Segments
      • For Key ID, enter guestinfo.segment_count
    • In the Type tab, enter the following:
      • For Type, select Integer
      • For Range, enter range 2-248
    • Click on Save
    • Select the new property
    • Click Set Value, and enter an appropriate value, for example: 64
  7. Add vApp option guestinfo.internal_ip_cidr:

    • Under Properties, click ADD again
    • In the General tab, enter the following:
      • For Category, enter Internal Network
      • For Label, enter Internal Network CIDR (with netmask /24)
      • For Key ID, enter guestinfo.internal_ip_cidr
    • In the Type tab, enter the following:
      • For Type, select String
      • For Length, enter range 12-18
    • Click on Save
    • Select the new property
    • Click Set Value, and enter an appropriate value: for example: 192.168.10.1/24

Creating the Greenplum Template

Clone the newly created and configured virtual machine to a template; all the virtual machines in the Greenplum Database cluster will be created from this template.

  1. Log in to vCenter and navigate to Hosts and Clusters.
  2. Right click the greenplum-db-template-vm virtual machine.
  3. Select CloneClone to template.
  4. Enter the template name as greenplum-db-template, then click Next.
  5. Select your cluster, then click Next.
  6. Select the vSAN datastore and the appropriate VM Storage Policy that you configured on Setting Up vSphere Storage or Setting Up vSphere Encryption, then click Next.
  7. Review your configuration, then click Finish.

Validating the Template

Validate that the newly created template is configured correctly by creating a test virtual machine from the template, and verify that all settings are configured correctly.

Creating a Test Virtual Machine

  1. Log in to vCenter and navigate to VMs and Templates.
  2. Right-click the Greenplum template greenplum-db-template and select New VM from this Template.
  3. Enter a name for the virtual machine and click Next.
  4. Select your cluster, then click Next.
  5. Select the vSAN datastore and select Keep existing VM storage policies for VM Storage Policy, then click Next.
  6. Select Power on virtual machine after creation, then click Next.
  7. Review your configuration, then click Finish.

Verifying the Test Virtual Machine Settings

  1. Log in to the virtual machine as root.

  2. Verify that the following services are disabled:

    1. SELinux

      $ sestatus
      SELinux status:                 disabled
      
    2. Firewall

      $ systemctl status firewalld
      firewalld.service
      Loaded: masked (/dev/null; bad)
      Active: inactive (dead)
      
    3. Tune

      $ systemctl status tuned
      tuned.service
      Loaded: masked (/dev/null; bad)
      Active: inactive (dead)
      
    4. Chrony

      $ systemctl status chronyd
      chronyd.service
      Loaded: masked (/dev/null; bad)
      Active: inactive (dead)
      
  3. Verify that ntpd is installed and enabled:

    $ systemctl status ntpd
    ntpd.service - Network Time Service
        Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
        Active: active (running) since Tue 2021-05-04 18:47:25 EDT; 4s ago
    
  4. Verify that the NTP servers are configured correctly and the remote servers are ordered properly:

    $ ntpq -pn
    
         remote           refid         st t when poll reach   delay   offset  jitter
    =================================================================================
    -xx.xxx.xxx.xxx   xx.xxx.xxx.xxx     3 u  246  256  377    0.186    2.700   0.993
    +xx.xxx.xxx.xxx   xx.xxx.xxx.xxx     3 u  223  256  377   26.508    0.247   0.397
    
  5. Verify that the filesystem configuration is correct:

    $ lsblk /dev/sdb
    NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
    sdb    8:16   0  250G  0 disk /gpdata/
    
    $ grep sdb /etc/fstab
    /dev/sdb /gpdata/ xfs rw,nodev,noatime,inode64 0 0
    
    $ df -Th | grep sdb
    /dev/sdb                xfs       250G  167M  250G   1% /gpdata
    
    $ ls -l /gpdata
    total 0
    drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 master
    drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 mirror
    drwxrwxr-x 2 gpadmin gpadmin 6 Jun 10 15:20 primary
    
  6. Verify that the parameters transparent_hugepage=never and elevator=deadline exist:

    $ cat /proc/cmdline
    BOOT_IMAGE=/vmlinuz-3.10.0-1160.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto spectre_v2=retpoline rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 transparent_hugepage=never elevator=deadline
    
  7. Verify that the ulimit settings match your specification by running the following command:

    $ ulimit -a
    core file size          (blocks, -c) 0
    data seg size           (kbytes, -d) unlimited
    scheduling priority             (-e) 0
    file size               (blocks, -f) unlimited
    pending signals                 (-i) 119889
    max locked memory       (kbytes, -l) 64
    max memory size         (kbytes, -m) unlimited
    open files                      (-n) 524288
    pipe size            (512 bytes, -p) 8
    POSIX message queues     (bytes, -q) 819200
    real-time priority              (-r) 0
    stack size              (kbytes, -s) 8192
    cpu time               (seconds, -t) unlimited
    max user processes              (-u) 131072
    virtual memory          (kbytes, -v) unlimited
    file locks                      (-x) unlimited
    
  8. Verify that the necessary yum packages are installed, by running rpm -qa:

    $ rpm -qa | grep apr
    $ rpm -qa | grep apr-util
    $ rpm -qa | grep dstat
    $ rpm -qa | grep greenplum-db-6
    $ rpm -qa | grep krb5-devel
    $ rpm -qa | grep libcgroup-tools
    $ rpm -qa | grep libevent
    $ rpm -qa | grep libyaml
    $ rpm -qa | grep net-tools
    $ rpm -qa | grep ntp
    $ rpm -qa | grep perl
    $ rpm -qa | grep rsync
    $ rpm -qa | grep sos
    $ rpm -qa | grep tree
    $ rpm -qa | grep wget
    $ rpm -qa | grep which
    $ rpm -qa | grep zip
    
  9. Verify that you configured the Greenplum Database cgroups correctly by running the commands below.

    1. Identify the cgroup directory mount point:

      $ grep cgroup /proc/mounts
      

      The first line from the above output identifies the cgroup mount point. For example, /sys/fs/cgroup.

    2. Run the following commands, replacing <cgroup_mount_point> with the mount point which you identified in the previous step:

      $ ls -l <cgroup_mount_point>/cpu/gpdb
      $ ls -l <cgroup_mount_point>/cpuacct/gpdb
      $ ls -l <cgroup_mount_point>/cpuset/gpdb
      $ ls -l <cgroup_mount_point>/memory/gpdb
      

      The above directories must exist and must be owned by gpadmin:gpadmin.

    3. Verify that the cgconfig service is running by executing the following command:

      $ systemctl status cgconfig.service
      
  10. Verify that the sysctl settings have been applied correctly based on your virtual machine settings.

    1. First define the variable $RAM_IN_BYTES again on this virtual machine. For example, for a 30 GB RAM:

      $ RAM_IN_BYTES=$((30 * 1024 * 1024 * 1024))
      
    2. Retrieve the values listed below by running sysctl <kernel setting> and confirm that the values match the verifier specified for each setting.

      Kernel Setting Value
      vm.min_free_kbytes $(($RAM_IN_BYTES * 3 / 100 / 1024))
      vm.overcommit_memory 2
      vm.overcommit_ratio 95
      net.ipv4.ip_local_port_range 10000 65535
      kernel.shmall $(($RAM_IN_BYTES / 2 / 4096))
      kernel.shmmax $(($RAM_IN_BYTES / 2))
    3. For a virtual machine with 64 GB of RAM or less:

      Kernel Setting Value
      vm.dirty_background_ratio 3
      vm.dirty_ratio 10
    4. For a virtual machine with more than 64 GB of RAM:

      Kernel Setting Value
      vm.dirty_background_ratio 0
      vm.dirty_ratio 0
      vm.dirty_background_bytes 1610612736
      vm.dirty_bytes 4294967296
  11. Verify that ssh command allows passwordless login as gpadmin user without prompting for a password:

    $ su - gpadmin
    $ ssh localhost
    $ exit
    $ exit
    
  12. Verify the readahead value:

    $ /sbin/blockdev --getra /dev/sdb
    16384
    
  13. Verify the RX Jumbo buffer ring setting:

    $ /sbin/ethtool -g ens192 | grep Jumbo
    RX Jumbo: 4096
    RX Jumbo: 4096
    
  14. Verify the MTU size:

    $ /sbin/ip a | grep 9000
    2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 1000
    

Allocating the Virtual Machines with Terraform

Provisioning the Virtual Machines

Use the Terraform software you installed in Creating the Jumpbox Virtual Machine to generate copies of the template virtual machine you just created, you will configure them based on the number of virtual machines in your environment, IP address ranges, and other settings you specify in the installation script.

  1. Download the main.tf file, and scp the file to the jumpbox as root user under the home directory.

  2. Log in to the jumpbox virtual machine as root.

  3. Update the following variables under the Terraform variables section of the main.tf script with the correct values for your environment. You collected the required information in the Prerequisites section.

    Variable Description
    vsphere_user Name of the vSphere administrator level user.
    vsphere_password Password of the vSphere administrator level user.
    vsphere_server The IP address or, preferably, the Fully-Qualified Domain Name (FQDN) of your vCenter server.
    vsphere_datacenter The name of the data center for Greenplum in your vCenter environment.
    vsphere_compute_cluster The name of the compute cluster for Greenplum in your data center.
    vsphere_datastore The name of the vSAN datastore which will contain your Greenplum data.
    vsphere_storage_policy The name of the storage policy defined during Setting Up vSphere Storage or Setting Up vSphere Encryption.
    gp_virtual_external_ipv4_addresses The routable IP addresses for mdw and smdw, in that order; for example: ["10.0.0.111", "10.0.0.112"].
    gp_virtual_external_ipv4_netmask The number of bits in the netmask for gp-virtual-external; for example: 24.
    gp_virtual_external_gateway The gateway IP address for the gp-virtual-external network.
    dns_servers The DNS servers for the gp-virtual-external network, listed as an array; for example: ["8.8.8.8", "8.8.4.4"].
    gp_virtual_etl_bar_ipv4_cidr The leading octets for the ETL, backup and restore network, non-routable network gp-virtual-etl-bar; for example: '192.168.2.0/24'.

     

  4. Initialize Terraform:

    $ terraform init
    

    You should get the following output:

    Terraform has been successfully initialized!
    
    You may now begin working with Terraform. Try running "terraform plan" to see
    any changes that are required for your infrastructure. All Terraform commands
    should now work.
    
    If you ever set or change modules or backend configuration for Terraform,
    re-run this command to reinitialize your working directory. If you forget, other
    commands will detect it and remind you to do so if necessary.
    
  5. Verify that your Terraform configuration is correct by running the following command:

    $ terraform plan
    
  6. Deploy the cluster:

    $ terraform apply
    

    Answer Yes to the following prompt:

    Do you want to perform these actions?
      Terraform will perform the actions described above.
      Only 'yes' will be accepted to approve.
    
        Enter a value: yes
    

The virtual machines will be created and configured to deploy your Greenplum cluster. You can check the progress under the Recent Tasks panel on your vSphere client.

Once Terraform has completed, it generates a file named terraform.tfstate. This file must not be deleted, as it keeps a record of all the virtual machines and their states. Terraform also uses this file when modifying any virtual machines. We also recommend to retain a snapshot of the jumpbox virtual machine.

Terraform timeout

Occasionally, Terraform may time out when deploying the virtual machines. If a virtual machine cannot be cloned within the timeout value, by default 30 minutes, Terraform will fail and the cluster setup will be incomplete. Terraform will report the following error:

error cloning virtual machine: timeout waiting for clone to complete

You must review the root cause of the issue which resides within the vCenter environment, check host and storage performance in order to find out why a virtual machine is taking over 30 minutes to be cloned. There are two ways of working around this issue by editing Terraform settings:

  1. Reduce the parallelism of Terraform from 10 to 5 and redeploy the cluster by running the following command:

    terraform apply --parallelism 5
    
  2. Increase the Terraform timeout property, set in minutes. See more about this property in the Terraform documentation.

    Modify the main.tf script in two places, one for the segment_hosts and another one for the master_hosts, add the property timeout under the clone section:

    ...
    resource "vsphere_virtual_machine" "segment_hosts" {
    ...
    
      clone {
    ...
        timeout = 40
    ...
       }
    }
    
    resource "vsphere_virtual_machine" "master_hosts" {
    ...
      clone {
    ...
        timeout = 40
    ...
      }
    }
    

    After saving the changes, rerun terraform apply to redeploy the cluster.

Validating the Deployment

Once Terraform has provisioned the virtual machines, perform the following validation steps:

  1. Validate the Resource Pool for the Greenplum cluster.

    1. Log in to vCenter and navigate to Hosts and Clusters.

    2. Select the newly created resource pool and verify that the Resource Settings are as below:

      centered image
      Note that the Worst Case Allocation fields will differ depending on what is currently running in your environment.

    3. Click the expanding arrow next to the resource pool name, you should see all the newly created virtual machines: gp-1-mdw, gp-1-smdw, gp-1-sdw1, etc.

  2. Validate that the gp-virtual-internal network is working.

    1. Log in to the master node as root.

    2. Switch to gpadmin user.

      $ su - gpadmin
      
    3. Make sure that the file /home/gpadmin/hosts-all exists.

    4. Use the gpssh command to verify connectivity to all nodes in the gp-virtual-internal network.

      $ gpssh -f hosts-all -e hostname
      
  3. Validate the MTU settings on all virtual machines.

    1. Log in to the master node as root.

    2. Use the gpssh command to verify the value of the MTU.

      $ source /usr/local/greenplum-db/greenplum_path.sh
      $ gpssh -f /home/gpadmin/hosts-all -e "ifconfig ens192 | grep -i mtu"
      
  4. Clean Up the Temporary vSphere Admin Account

If you created a temporary vSphere administrator level user such as greenplum, it is safe to remove it now.

Deploying Greenplum

You are now ready to deploy Greenplum Database on the newly deployed cluster. Perform the steps below from the Greenplum master node.

Deploying a Greenplum Database Cluster

  1. Initialize the Greenplum cluster.

    1. Log in to the Greenplum master node as gpadmin user.

    2. Create the Greenplum configuration script create_gpinitsystem_config.sh and paste the following contents:

      #!/bin/bash
      # setup the gpinitsystem config
      primaryArray() {
        numOfSegments=$1
        array=""
        newline=$'\n'
        for i in $(seq 1 ${numOfSegments}); do
          array+="sdw$(($i*2-1))~sdw$(($i*2-1))~6000~/gpdata/primary/gpseg$(($i-1))~$(($i*2))~$(($i-1))${newline}"
        done
        echo "${array}"
      }
      mirrorArray() {
        numOfSegments=$1
        array=""
        newline=$'\n'
        for i in $(seq 1 ${numOfSegments}); do
          array+="sdw$(($i*2))~sdw$(($i*2))~7000~/gpdata/mirror/gpseg$(($i-1))~$(($i*2+1))~$(($i-1))${newline}"
        done
        echo "${array}"
      }
      create_gpinitsystem_config() {
      echo "Generate gpinitsystem"
      cat <<EOF> ./gpinitsystem_config
      ARRAY_NAME="Greenplum Data Platform"
      TRUSTED_SHELL=ssh
      CHECK_POINT_SEGMENTS=8
      ENCODING=UNICODE
      SEG_PREFIX=gpseg
      HEAP_CHECKSUM=on
      HBA_HOSTNAMES=0
      QD_PRIMARY_ARRAY=mdw~mdw~5432~/gpdata/master/gpseg-1~1~-1
      numTotalSegments=$1
      declare -a PRIMARY_ARRAY=(
      $( primaryArray $((${numTotalSegments}/2)) )
      )
      declare -a MIRROR_ARRAY=(
      $( mirrorArray $((${numTotalSegments}/2)) )
      )
      EOF
      }
      numTotalSegments=$1
      if [ -z "$numTotalSegments" ]; then
        echo "Usage: bash create_gpinitsystem_config.sh <num_total_segments>"
      else
        create_gpinitsystem_config ${numTotalSegments}
      fi
      
    3. Run the script to generate the configuration file for gpinitsystem. Replace 64 with the number of segments in your environment:

      $ bash create_gpinitsystem_config.sh 64
      

      You should now see a file called gpinitsystem_config.

    4. Run the following command to initialize the Greenplum Database:

      $ gpinitsystem -a -I gpinitsystem_config -s smdw
      
  2. Configure the Greenplum master and standby master environment variables, and load the master variables:

    $ echo export MASTER_DATA_DIRECTORY=/gpdata/master/gpseg-1 >> ~/.bashrc
    $ ssh smdw 'echo export MASTER_DATA_DIRECTORY=/gpdata/master/gpseg-1 >> ~/.bashrc'
    $ source ~/.bashrc
    
  3. Configure the Greenplum cluster with the commands below. Note that some of the parameter values will vary, depending on your virtual machine RAM size.

    ### Interconnect Settings
    $ gpconfig -c gp_interconnect_queue_depth -v 16 
    $ gpconfig -c gp_interconnect_snd_queue_depth -v 16
    
    # Since you have one segment per VM and less competing workloads per VM,
    # you can set the memory limit for resource group higher than the default 
    $ gpconfig -c gp_resource_group_memory_limit -v 0.85 
    
    # This value should be 5% of the total RAM on the VM
    $ gpconfig -c statement_mem -v 1536MB 
    
    # This value should be set to 25% of the total RAM on the VM
    $ gpconfig -c max_statement_mem -v 7680MB 
    
    # This value should be set to 85% of the total RAM on the VM
    $ gpconfig -c gp_vmem_protect_limit -v 26112
    
    # Since you have less I/O bandwidth, you can turn this parameter on
    $ gpconfig -c gp_workfile_compression -v on 
    
  4. Restart the Greenplum cluster for the newly configured settings to take effect:

    $ gpstop -r
    

Next Steps

Now that the Greenplum Database has been deployed, follow the steps provided in Validating the Greenplum Installation to ensure Greenplum Database has been installed correctly.

check-circle-line exclamation-circle-line close-line
Scroll to top icon