Aggregate functions allow us to group multiple rows of data by some defining characteristic of the group, such as job department or month, so that summary statistics about the group (averages, totals, maximums etc.) can be calculated. The output is a single value which is based on the set of values in the source data.
There are five aggregate functions, namely; SUM, AVG, MIN, MAX and COUNT.
These aggregate functions are used in conjunction with the GROUP BY and HAVING clauses as part of our SELECT statement.
The GROUP BY clause will divide the rows of a table into groups that have identical values in one or more columns.
- Group by X means put all those with the same value for X in the same row.
- Group by X, Y put all those with the same values for both X and Y in the same row.
If the GROUP BY clause is omitted when an aggregate function is used, then the entire table is considered as one group, and the function displays a single value for the entire table.
The HAVING clause performs the same function as the WHERE clause, but with aggregate values. A query can contain both a WHERE clause and a HAVING clause.
Together, fits into the SELECT expression in the following manner.
SELECT column-name1 [,column-name2]
FROM table-name
WHERE search-condition
GROUP BY column-name1 [,column-name2]
HAVING some-condition
ORDER BY column-name1 [DESC] [,column-name2] [DESC]
Having vs. Where
The WHERE clause is applied first to the individual rows in the tables. Only the rows that meet the conditions in the WHERE clause go on to be grouped.
The HAVING clause is then applied to the rows in the result set to filter values in the GROUP BY clause. Only the groups that meet the HAVING conditions appear in the query output.
Breaking GROUP BY down conceptually
- The GROUP BY partitions the table records into groups. Each group has the same set of values for the column(s) specified in the GROUP BY clause.
- The aggregates then operate on any of the columns specified, with the aggregate function calculated over the rows in the partitioned group.
- The result then consists of one row per group: the GROUP BY column values, plus the associated calculated aggregated values.