In this example, you:
pgtestdb
, user pxfuser1
, and PXF JDBC connector server configuration pgsrvcfg
that you created in Example: Reading From and Writing to a PostgreSQL Database.pxfuser1
.pgsrvcfg
JDBC server configuration.Perform the following procedure to create PostgreSQL tables named customers
and orders
in the public
schema of the database named pgtestdb
, and grant the user named pxfuser1
all privileges on these tables:
Identify the host name and port of your PostgreSQL server.
Connect to the pgtestdb
PostgreSQL database as the postgres
user. For example, if your PostgreSQL server is running on the default port on the host named pserver
:
$ psql -U postgres -h pserver -d pgtestdb
Create a table named customers
and insert some data into this table:
CREATE TABLE customers(id int, name text, city text, state text);
INSERT INTO customers VALUES (111, 'Bill', 'Helena', 'MT');
INSERT INTO customers VALUES (222, 'Mary', 'Athens', 'OH');
INSERT INTO customers VALUES (333, 'Tom', 'Denver', 'CO');
INSERT INTO customers VALUES (444, 'Kate', 'Helena', 'MT');
INSERT INTO customers VALUES (555, 'Harry', 'Columbus', 'OH');
INSERT INTO customers VALUES (666, 'Kim', 'Denver', 'CO');
INSERT INTO customers VALUES (777, 'Erik', 'Missoula', 'MT');
INSERT INTO customers VALUES (888, 'Laura', 'Athens', 'OH');
INSERT INTO customers VALUES (999, 'Matt', 'Aurora', 'CO');
Create a table named orders
and insert some data into this table:
CREATE TABLE orders(customer_id int, amount int, month int, year int);
INSERT INTO orders VALUES (111, 12, 12, 2018);
INSERT INTO orders VALUES (222, 234, 11, 2018);
INSERT INTO orders VALUES (333, 34, 7, 2018);
INSERT INTO orders VALUES (444, 456, 111, 2018);
INSERT INTO orders VALUES (555, 56, 11, 2018);
INSERT INTO orders VALUES (666, 678, 12, 2018);
INSERT INTO orders VALUES (777, 12, 9, 2018);
INSERT INTO orders VALUES (888, 120, 10, 2018);
INSERT INTO orders VALUES (999, 120, 11, 2018);
Assign user pxfuser1
all privileges on tables customers
and orders
, and then exit the psql
subsystem:
GRANT ALL ON customers TO pxfuser1;
GRANT ALL ON orders TO pxfuser1;
\q
In this procedure you create a named query text file, add it to the pgsrvcfg
JDBC server configuration, and synchronize the PXF configuration to the Greenplum Database cluster.
This procedure will typically be performed by the Greenplum Database administrator.
Log in to the Greenplum Database coordinator host:
$ ssh gpadmin@<coordinator>
Navigate to the JDBC server configuration directory pgsrvcfg
. For example:
gpadmin@coordinator$ cd $PXF_BASE/servers/pgsrvcfg
Open a query text file named pg_order_report.sql
in a text editor and copy/paste the following query into the file:
SELECT c.name, c.city, sum(o.amount) AS total, o.month
FROM customers c JOIN orders o ON c.id = o.customer_id
WHERE c.state = 'CO'
GROUP BY c.name, c.city, o.month
Save the file and exit the editor.
Synchronize these changes to the PXF configuration to the Greenplum Database cluster:
gpadmin@coordinator$ pxf cluster sync
Perform the following procedure on your Greenplum Database cluster to create a PXF external table that references the query file that you created in the previous section, and then reads the query result data:
Create the PXF external table specifying the jdbc
profile. For example:
CREATE EXTERNAL TABLE pxf_queryres_frompg(name text, city text, total int, month int)
LOCATION ('pxf://query:pg_order_report?PROFILE=jdbc&SERVER=pgsrvcfg&PARTITION_BY=month:int&RANGE=1:13&INTERVAL=3')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
With this partitioning scheme, PXF will issue 4 queries to the remote SQL database, one query per quarter. Each query will return customer names and the total amount of all of their orders in a given month, aggregated per customer, per month, for each month of the target quarter. Greenplum Database will then combine the data into a single result set for you when you query the external table.
Display all rows of the query result:
SELECT * FROM pxf_queryres_frompg ORDER BY city, total;
name | city | total | month
------+--------+-------+-------
Matt | Aurora | 120 | 11
Tom | Denver | 34 | 7
Kim | Denver | 678 | 12
(3 rows)
Use column projection to display the order total per city:
SELECT city, sum(total) FROM pxf_queryres_frompg GROUP BY city;
city | sum
--------+-----
Aurora | 120
Denver | 712
(2 rows)
When you run this query, PXF requests and retrieves query results for only the city
and total
columns, reducing the amount of data sent back to Greenplum Database.
Provide additional filters and aggregations to filter the total
in PostgreSQL:
SELECT city, sum(total) FROM pxf_queryres_frompg
WHERE total > 100
GROUP BY city;
city | sum
--------+-----
Denver | 678
Aurora | 120
(2 rows)
In this example, PXF will add the WHERE
filter to the subquery. This filter is pushed to and run on the remote database system, reducing the amount of data that PXF sends back to Greenplum Database. The GROUP BY
aggregation, however, is not pushed to the remote and is performed by Greenplum.