Configuring PXF for Secure HDFS

When Kerberos is enabled for your HDFS filesystem, PXF, as an HDFS client, requires a principal and keytab file to authenticate access to HDFS. To read or write files on a secure HDFS, you must create and deploy Kerberos principals and keytabs for PXF, and ensure that Kerberos authentication is enabled and functioning.

Prerequisites

Before you configure PXF for access to a secure HDFS filesystem, ensure that you have:

  • Initialized, configured, and started PXF as described in Configuring PXF, including enabling PXF and Hadoop user impersonation.

  • Enabled Kerberos for your Hadoop cluster per the instructions for your specific distribution and verified the configuration.

  • Verified that the HDFS configuration parameter dfs.block.access.token.enable is set to true. You can find this setting in the hdfs-site.xml configuration file on a host in your Hadoop cluster.

  • Noted the host name or IP address of each Greenplum Database segment host (<seghost>) and the Kerberos Key Distribution Center (KDC) <kdc-server> host.

  • Noted the name of the Kerberos <realm> in which your cluster resides.

Procedure

Perform the following steps to configure PXF for a secure HDFS. You will perform operations on the Kerberos KDC server and Greenplum Database segment hosts.

Perform the following steps on Kerberos KDC server host:

  1. Log in to the Kerberos KDC server as the root user.

    $ ssh root@<kdc-server>
    root@kdc-server$ 
    
  2. Distribute the /etc/krb5.conf Kerberos configuration file on the KDC server host to each segment host in your Greenplum Database cluster if not already present. For example:

    root@kdc-server$ scp /etc/krb5.conf seghost:/etc/krb5.conf
    
  3. Use the kadmin.local command to create a Kerberos PXF service principal for each Greenplum Database segment host. The service principal should be of the form gpadmin/<seghost>@<realm> where <seghost> is the DNS resolvable, fully-qualified hostname of the segment host system (output of the hostname -f command).

    For example, these commands create PXF service principals for the hosts named host1.example.com, host2.example.com, and host3.example.com in the Kerberos realm named EXAMPLE.COM:

    root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host1.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host2.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "addprinc -randkey -pw changeme gpadmin/host3.example.com@EXAMPLE.COM"
    
  4. Generate a keytab file for each PXF service principal that you created in the previous step. Save the keytab files in any convenient location (this example uses the directory /etc/security/keytabs). You will deploy the keytab files to their respective Greenplum Database segment host machines in a later step. For example:

    root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host1.service.keytab gpadmin/host1.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host2.service.keytab gpadmin/host2.example.com@EXAMPLE.COM"
    root@kdc-server$ kadmin.local -q "xst -norandkey -k /etc/security/keytabs/pxf-host3.service.keytab gpadmin/host3.example.com@EXAMPLE.COM"
    

    Repeat the xst command as necessary to generate a keytab for each PXF service principal that you created in the previous step.

  5. List the principals. For example:

    root@kdc-server$ kadmin.local -q "listprincs"
    
  6. Copy the keytab file for each PXF service principal to its respective segment host. For example, the following commands copy each principal generated in step 4 to the PXF default keytab directory on the segment host when PXF_CONF=/etc/pxf/usercfg:

    root@kdc-server$ scp /etc/security/keytabs/pxf-host1.service.keytab host1.example.com:/etc/pxf/usercfg/keytabs/pxf.service.keytab
    root@kdc-server$ scp /etc/security/keytabs/pxf-host2.service.keytab host2.example.com:/etc/pxf/usercfg/keytabs/pxf.service.keytab
    root@kdc-server$ scp /etc/security/keytabs/pxf-host3.service.keytab host3.example.com:/etc/pxf/usercfg/keytabs/pxf.service.keytab
    

    Note the file system location of the keytab file on each PXF host; you will need this information for a later configuration step.

  7. Change the ownership and permissions on the pxf.service.keytab files. The files must be owned and readable by only the gpadmin user. For example:

    root@kdc-server$ ssh host1.example.com chown gpadmin:gpadmin /etc/pxf/usercfg/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host1.example.com chmod 400 /etc/pxf/usercfg/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host2.example.com chown gpadmin:gpadmin /etc/pxf/usercfg/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host2.example.com chmod 400 /etc/pxf/usercfg/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host3.example.com chown gpadmin:gpadmin /etc/pxf/usercfg/keytabs/pxf.service.keytab
    root@kdc-server$ ssh host3.example.com chmod 400 /etc/pxf/usercfg/keytabs/pxf.service.keytab
    

Perform the following steps on each Greenplum Database segment host:

  1. Log in to the segment host. For example:

    $ ssh gpadmin@<seghost>
    
  2. Install the Kerberos client packages on each Greenplum Database segment host if they are not already installed. You must have superuser permissions to install operating system packages. For example:

    root@seghost$ rpm -qa | grep krb
    root@seghost$ yum install krb5-libs krb5-workstation
    
  3. Open the PXF pxf-env.sh user configuration file in the editor of your choice. For example, to open the file with vi when PXF_CONF=/etc/pxf/usercfg:

    gpadmin@seghost$ vi /etc/pxf/usercfg/conf/pxf-env.sh
    
  4. Update the PXF_KEYTAB and PXF_PRINCIPAL settings, if required. Specify the location of the keytab file and the Kerberos principal, substituting your realm. The default values for these settings are identified below:

    export PXF_KEYTAB="${PXF_CONF}/keytabs/pxf.service.keytab"
    export PXF_PRINCIPAL="gpadmin/_HOST@EXAMPLE.COM"
    

    PXF automatically replaces _HOST with the FQDN of the segment host.

  5. Restart PXF on the segment host:

    gpadmin@seghost$ $GPHOME/pxf/bin/pxf restart