OQL Aggregate Functions

This topic provide descriptions and limitations about the aggregate functions for queries in VMware Tanzu GemFire, and explanations of how they work and how they can be used.

The following aggregate functions are supported:

GROUP BY
MIN, MAX, COUNT, and COUNT over a DISTINCT expression
SUM and SUM over a DISTINCT expression
AVG, and AVG over a DISTINCT expression

GROUP BY

You can collect data across multiple entries and group the results by one or more columns through the usage of the GROUP BY statement. Note the following facts about its usage:

It can group results by one or more fields.
It returns a single record / entry per group.
It must always be placed before the ORDER BY clause, if any.
It can be used in conjunction with other aggregate functions: MIN, MAX, COUNT, SUM, and AVG.
It groups records using the selected fields if and only if the fields have identical data across entries.
It is required, whenever an aggregate function is used within a query with other selected fields, to also use GROUP BY.
If there are no other aggregate functions within the query, all fields included within a GROUP BY clause must also be part of the original projection list, and all fields included within the projection list must also be part of the GROUP BY clause.

The following are example GROUP BY queries.

The following GROUP BY query returns the maximum amount of sales per employee.

SELECT ID, MAX(e.sales)
FROM /employees e
GROUP BY ID

The following GROUP BY query returns the minimum, maximum, total count, average and summation of IDs grouped by status.

SELECT pf.status, MIN(pf.ID), MAX(pf.ID), COUNT(pf.ID), AVG(pf.ID), SUM(pf.ID)
FROM /portfolio pf
GROUP BY pf.status

MIN

The MIN keyword returns the minimum or smallest value from the selected expression. The expression itself must always evaluate to java.lang.Comparable. The MIN statement returns the actual type of the selected element as its result.

The following are example MIN queries that return region entries (the entries implement the java.lang.Comparable interface).

SELECT MIN(pf)
FROM /exampleRegion pf

SELECT MIN(pf)
FROM /exampleRegion
pf WHERE pf.ID > 0

SELECT MIN(pf)
FROM /exampleRegion pf
WHERE pf.ID > 10 LIMIT 50

SELECT MIN(pf) 
FROM /exampleRegion pf 
WHERE pf.ID > 0 AND pf.status LIKE 'act%'

The following MIN query returns the lowest entry ID that matches the query’s selection criteria.

SELECT MIN(pf.ID)
FROM /exampleRegion pf, pf.positions.values pos
WHERE pf.ID > 0 AND pos.secId = 'IBM'

The following MIN query returns the lowest positive ID grouped by status.

SELECT pf.status, MIN(pf.ID)
FROM /exampleRegion pf
WHERE pf.ID > 0
GROUP BY pf.status

MAX

The MAX keyword returns the maximum or highest value from the selected expression. The expression itself must always evaluate to java.lang.Comparable. The MAX statement returns the type of the selected element as its result.

The following are example MAX queries that return region entries (the entries implement the java.lang.Comparable interface).

SELECT MAX(pf)
FROM /exampleRegion pf

SELECT MAX(pf)
FROM /exampleRegion
pf WHERE pf.ID > 0

SELECT MAX(pf)
FROM /exampleRegion pf
WHERE pf.ID > 10 LIMIT 50

SELECT MAX(pf) 
FROM /exampleRegion pf 
WHERE pf.ID > 0 AND pf.status LIKE 'act%'

The following MAX query returns the highest entry ID that matches the query’s selection criteria.

SELECT MAX(pf.ID)
FROM /exampleRegion pf, pf.positions.values pos
WHERE pf.ID > 0 AND pos.secId = 'IBM'

The following MAX query returns the highest positive IDs grouped by status.

SELECT pf.status, MAX(pf.ID)
FROM /exampleRegion pf
WHERE pf.ID > 0
GROUP BY pf.status

COUNT

The COUNT keyword returns the number of results that match the query selection conditions specified in the WHERE clause. Using COUNT allows you to determine the size of a results set. The COUNT statement always returns a java.lang.Integer or java.lang.Long as the result (depending on how big the value is); you should take this into consideration when executing the query: if an overflow occurs while computing the COUNT function because the value is higher than Long.MAX_VALUE (2⁶³ - 1), the result will be incorrect.

The following queries are example COUNT queries that return region entries:

SELECT COUNT(*)
FROM /exampleRegion

SELECT COUNT(*)
FROM /exampleRegion
WHERE ID > 0

SELECT COUNT(*)
FROM /exampleRegion
WHERE ID > 0 LIMIT 50

SELECT COUNT(*)
FROM /exampleRegion 
WHERE ID > 0 AND status LIKE 'act%'

SELECT COUNT(*)
FROM /exampleRegion 
WHERE ID IN SET(1,2,3,4,5)

The following COUNT query returns the total number of StructTypes that match the query’s selection criteria.

SELECT COUNT(*) 
FROM /exampleRegion p, p.positions.values pos 
WHERE p.ID > 0 AND pos.secId 'IBM'

The following COUNT query uses the DISTINCT keyword and eliminates duplicates from the number of results.

SELECT DISTINCT COUNT(*)
FROM /exampleRegion p, p.positions.values pos
WHERE p.ID > 0 OR p.status = 'active' OR pos.secId = 'IBM'

SUM

The SUM keyword returns the summation of all results that match the query selection conditions specified in the WHERE clause. Using SUM allows you to aggregate specific numeric values within a results set. For partitioned regions, each node’s buckets compute a sum over that node and return the result to the coordinator node executing the query, which then aggregates the sums across all nodes.

The SUM function where the DISTINCT modifier is applied to the expression returns the summation over the set of unique (distinct) values. For partitioned regions, the distinct values in a node’s buckets are returned to the coordinator node, which can then calculate the sum over the values that are unique across nodes, after eliminating duplicate values that come from separate nodes.

The actual expression used to calculate the aggregation should be an instance of java.lang.Number. The SUM statement always returns a java.lang.Number as the result and, depending on how big the value is and whether it has a decimal component or not, the returned type could be an instance of java.lang.Integer, java.lang.Long, java.lang.Float or java.lang.Double; you should take this into consideration when executing the query: if an overflow occurs while computing the SUM function because the value is higher than Double.MAX_VALUE ((2 - 2^-52) * 2¹⁰²³), the result will be incorrect.

The following are example SUM queries that return the summation of the entries ID.

SELECT SUM(ID)
FROM /exampleRegion

SELECT SUM(ID)
FROM /exampleRegion
WHERE ID > 0

SELECT SUM(ID)
FROM /exampleRegion
WHERE ID > 0 LIMIT 50

SELECT SUM(ID)
FROM /exampleRegion 
WHERE ID > 0 AND status LIKE 'act%'

SELECT SUM(ID)
FROM /exampleRegion 
WHERE ID IN SET(1,2,3,4,5)

The following SUM query returns the total summation of positive IDs grouped by status.

SELECT pf.status, SUM(pf.ID) 
FROM /exampleRegion pf
WHERE pf.ID > 0
GROUP BY pf.status

The following SUM query uses the DISTINCT keyword and eliminates duplicates from the aggregation.

SELECT SUM(DISTINCT pf.ID)
FROM /exampleRegion pf, pf.positions.values pos
WHERE pf.ID > 0 OR pf.status = 'active' OR pos.secId = 'IBM'

The following SUM query returns the total aggregation of positive IDs grouped by status and sorted by the aggregation result in descending order.

SELECT pf.status, SUM(pf.ID) as sm 
FROM /exampleRegion pf
WHERE pf.ID > 0
GROUP BY pf.status
ORDER BY sm DESC

AVG

The AVG keyword returns the arithmetic mean of the set formed by the selected expression. For partitioned regions, each node’s buckets provide both a sum and the number of elements to the node executing the query (coordinator), such that a correct average may be computed.

The AVG keyword where the DISTINCT modifier is applied to the expression returns the arithmetic mean of the set of unique (distinct) values. For partitioned regions, the distinct values in a node’s buckets are returned to the coordinator node, which can then calculate the average over the values that are unique across nodes, after eliminating duplicate values that come from separate nodes.

The actual expression used to calculate the aggregation should be an instance of java.lang.Number. The AVG statement always returns a java.lang.Number as the result and, depending on how big the value is and whether it has a decimal component or not, the returned type could be an instance of java.lang.Integer, java.lang.Long, java.lang.Float or java.lang.Double; you should take this into consideration when executing the query: if an overflow occurs while computing the AVG function because the value is higher than Double.MAX_VALUE ((2 - 2^-52) * 2¹⁰²³), or if an overflow occurs while computing the intermediate count because the amount of elements is higher than Long.MAX_VALUE (2⁶³ - 1), the result will be incorrect.

The following are example AVG queries that calculate the average of the entries ID.

SELECT AVG(ID)
FROM /exampleRegion

SELECT AVG(ID)
FROM /exampleRegion
WHERE ID > 0

SELECT AVG(ID)
FROM /exampleRegion
WHERE ID > 0 LIMIT 50

SELECT AVG(ID)
FROM /exampleRegion 
WHERE ID > 0 AND status LIKE 'act%'

SELECT AVG(ID)
FROM /exampleRegion 
WHERE ID IN SET(1,2,3,4,5)

The following AVG query returns the average of positive IDs grouped by status.

SELECT pf.status, AVG(pf.ID) 
FROM /exampleRegion pf
WHERE pf.ID > 0
GROUP BY pf.status

The following AVG query uses the DISTINCT keyword and eliminates duplicates from the aggregation.

SELECT AVG(DISTINCT pf.ID)
FROM /exampleRegion pf, pf.positions.values pos
WHERE pf.ID > 0 OR pf.status = 'active' OR pos.secId = 'IBM'

The following AVG query returns the average of positive IDs grouped by status and sorted by the calculation result in descending order.

SELECT pf.status, AVG(pf.ID) as sm 
FROM /exampleRegion pf
WHERE pf.ID > 0
GROUP BY pf.status
ORDER BY sm DESC