1512

I understand the point of GROUP BY x.

But how does GROUP BY x, y work, and what does it mean?

1
  • 2
    You won't find it described as this question poses it. The GROUP BY clause can take one or more fields. GROUP BY customer; GROUP BY lastname, firstname; GROUP BY year, store, sku etc.
    – Bill
    Mar 10, 2010 at 23:32

3 Answers 3

2781

Group By X means put all those with the same value for X in the one group.

Group By X, Y means put all those with the same values for both X and Y in the one group.

To illustrate using an example, let's say we have the following table, to do with who is attending what subject at a university:

Table: Subject_Selection

+---------+----------+----------+
| Subject | Semester | Attendee |
+---------+----------+----------+
| ITB001  |        1 | John     |
| ITB001  |        1 | Bob      |
| ITB001  |        1 | Mickey   |
| ITB001  |        2 | Jenny    |
| ITB001  |        2 | James    |
| MKB114  |        1 | John     |
| MKB114  |        1 | Erica    |
+---------+----------+----------+

When you use a group by on the subject column only; say:

select Subject, Count(*)
from Subject_Selection
group by Subject

You will get something like:

+---------+-------+
| Subject | Count |
+---------+-------+
| ITB001  |     5 |
| MKB114  |     2 |
+---------+-------+

...because there are 5 entries for ITB001, and 2 for MKB114

If we were to group by two columns:

select Subject, Semester, Count(*)
from Subject_Selection
group by Subject, Semester

we would get this:

+---------+----------+-------+
| Subject | Semester | Count |
+---------+----------+-------+
| ITB001  |        1 |     3 |
| ITB001  |        2 |     2 |
| MKB114  |        1 |     2 |
+---------+----------+-------+

This is because, when we group by two columns, it is saying "Group them so that all of those with the same Subject and Semester are in the same group, and then calculate all the aggregate functions (Count, Sum, Average, etc.) for each of those groups". In this example, this is demonstrated by the fact that, when we count them, there are three people doing ITB001 in semester 1, and two doing it in semester 2. Both of the people doing MKB114 are in semester 1, so there is no row for semester 2 (no data fits into the group "MKB114, Semester 2")

Hopefully that makes sense.

12
  • 40
    @Smashery: So does this also mean that GROUP BY A,B is same as GROUP BY B,A? Sep 26, 2014 at 18:08
  • 69
    Yes, it does. I can't say for certain whether they are as efficient as each other, but they will give the same result, yes.
    – Smashery
    Sep 29, 2014 at 0:10
  • 5
    May here should be added that there is a difference between GROUP BY a, b and GROUP BY a AND b since the second one only lists grouped items with exactly the same content and no "undergroups". In this case the output would be same like the first one.
    – Dwza
    Mar 4, 2015 at 15:06
  • 6
    I would like to add that the order in which you group by the columns does not matter. In the above example group by Semester, Subject would have given the same result Sep 29, 2015 at 21:07
  • 6
    well, group by a, b and group by b, a do NOT return the same result - the rows are displayed in a different order
    – fanny
    Oct 3, 2018 at 9:14
106

Here I am going to explain not only the GROUP clause use, but also the Aggregate functions use.

The GROUP BY clause is used in conjunction with the aggregate functions to group the result-set by one or more columns. e.g.:

-- GROUP BY with one parameter:
SELECT column_name, AGGREGATE_FUNCTION(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name;

-- GROUP BY with two parameters:
SELECT
    column_name1,
    column_name2,
    AGGREGATE_FUNCTION(column_name3)
FROM
    table_name
GROUP BY
    column_name1,
    column_name2;

Remember this order:

  1. SELECT (is used to select data from a database)

  2. FROM (clause is used to list the tables)

  3. WHERE (clause is used to filter records)

  4. GROUP BY (clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns)

  5. HAVING (clause is used in combination with the GROUP BY clause to restrict the groups of returned rows to only those whose the condition is TRUE)

  6. ORDER BY (keyword is used to sort the result-set)

You can use all of these if you are using aggregate functions, and this is the order that they must be set, otherwise you can get an error.

Aggregate Functions are:

MIN() returns the smallest value in a given column

MAX() returns the maximum value in a given column.

SUM() returns the sum of the numeric values in a given column

AVG() returns the average value of a given column

COUNT() returns the total number of values in a given column

COUNT(*) returns the number of rows in a table

SQL script examples about using aggregate functions:

Let's say we need to find the sale orders whose total sale is greater than $950. We combine the HAVING clause and the GROUP BY clause to accomplish this:

SELECT 
    orderId, SUM(unitPrice * qty) Total
FROM
    OrderDetails
GROUP BY orderId
HAVING Total > 950;

Counting all orders and grouping them customerID and sorting the result ascendant. We combine the COUNT function and the GROUP BY, ORDER BY clauses and ASC:

SELECT 
    customerId, COUNT(*)
FROM
    Orders
GROUP BY customerId
ORDER BY COUNT(*) ASC;

Retrieve the category that has an average Unit Price greater than $10, using AVG function combine with GROUP BY and HAVING clauses:

SELECT 
    categoryName, AVG(unitPrice)
FROM
    Products p
INNER JOIN
    Categories c ON c.categoryId = p.categoryId
GROUP BY categoryName
HAVING AVG(unitPrice) > 10;

Getting the less expensive product by each category, using the MIN function in a subquery:

SELECT categoryId,
       productId,
       productName,
       unitPrice
FROM Products p1
WHERE unitPrice = (
                SELECT MIN(unitPrice)
                FROM Products p2
                WHERE p2.categoryId = p1.categoryId)

The following will show you how to select the most recent date item "productDate", using MAX function in a subquery:

SELECT categoryId,
       productId,
       productName,
       unitPrice,
       productDate
FROM Products p1
WHERE productDate= (
                  SELECT MAX(productDate) 
                  FROM Products p2
                  WHERE p2.categoryId = p1.categoryId)

The following statement groups rows with the same values in both categoryId and productId columns:

SELECT 
    categoryId, categoryName, productId, SUM(unitPrice)
FROM
    Products p
INNER JOIN
    Categories c ON c.categoryId = p.categoryId
GROUP BY categoryId, productId

To answer a question: - what if use GROUP BY but no aggregate function? The answer is: We can also use the GROUP BY without applying the Aggregate function. Here is an example where we are grouping by categoryId:

SELECT categoryId,
       productId,
       productName,
       unitPrice
FROM Products
GROUP BY categoryId
6
  • 4
    but where do we put the 2 columns, how to aggregate based on 2/more columns is the question Nov 17, 2017 at 22:13
  • 6
    This doesn't even remotely answer the question... The question here is how to acheive "chained grouping" of "subject" and "semester" at the same time, as explained in the given example...
    – MahNas92
    May 20, 2019 at 10:22
  • 1
    The last example shows you how to put 2 columns using aggregate function. @ChaitanyaBapat
    – S. Mayol
    Jan 7, 2021 at 20:47
  • This doesn't work in Oracle if you have more select column that you don't wan't to group.
    – Marco
    Aug 10, 2022 at 22:48
  • what if use GROUP BY but no aggregate function? Nov 30, 2023 at 14:07
4

In simple English from GROUP BY with two parameters what we are doing is looking for similar value pairs and get the count to a 3rd column.

Look at the following example for reference. Here I'm using International football results from 1872 to 2020

+----------+----------------+--------+---+---+--------+---------+-------------------+-----+
|       _c0|             _c1|     _c2|_c3|_c4|     _c5|      _c6|                _c7|  _c8|
+----------+----------------+--------+---+---+--------+---------+-------------------+-----+
|1872-11-30|        Scotland| England|  0|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1873-03-08|         England|Scotland|  4|  2|Friendly|   London|            England|FALSE|
|1874-03-07|        Scotland| England|  2|  1|Friendly|  Glasgow|           Scotland|FALSE|
|1875-03-06|         England|Scotland|  2|  2|Friendly|   London|            England|FALSE|
|1876-03-04|        Scotland| England|  3|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1876-03-25|        Scotland|   Wales|  4|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1877-03-03|         England|Scotland|  1|  3|Friendly|   London|            England|FALSE|
|1877-03-05|           Wales|Scotland|  0|  2|Friendly|  Wrexham|              Wales|FALSE|
|1878-03-02|        Scotland| England|  7|  2|Friendly|  Glasgow|           Scotland|FALSE|
|1878-03-23|        Scotland|   Wales|  9|  0|Friendly|  Glasgow|           Scotland|FALSE|
|1879-01-18|         England|   Wales|  2|  1|Friendly|   London|            England|FALSE|
|1879-04-05|         England|Scotland|  5|  4|Friendly|   London|            England|FALSE|
|1879-04-07|           Wales|Scotland|  0|  3|Friendly|  Wrexham|              Wales|FALSE|
|1880-03-13|        Scotland| England|  5|  4|Friendly|  Glasgow|           Scotland|FALSE|
|1880-03-15|           Wales| England|  2|  3|Friendly|  Wrexham|              Wales|FALSE|
|1880-03-27|        Scotland|   Wales|  5|  1|Friendly|  Glasgow|           Scotland|FALSE|
|1881-02-26|         England|   Wales|  0|  1|Friendly|Blackburn|            England|FALSE|
|1881-03-12|         England|Scotland|  1|  6|Friendly|   London|            England|FALSE|
|1881-03-14|           Wales|Scotland|  1|  5|Friendly|  Wrexham|              Wales|FALSE|
|1882-02-18|Northern Ireland| England|  0| 13|Friendly|  Belfast|Republic of Ireland|FALSE|
+----------+----------------+--------+---+---+--------+---------+-------------------+-----+

And now I'm going to group by similar country(column _c7) and tournament(_c5) value pairs by GROUP BY operation,

SELECT `_c5`,`_c7`,count(*)  FROM res GROUP BY `_c5`,`_c7`

+--------------------+-------------------+--------+
|                 _c5|                _c7|count(1)|
+--------------------+-------------------+--------+
|            Friendly|  Southern Rhodesia|      11|
|            Friendly|            Ecuador|      68|
|African Cup of Na...|           Ethiopia|      41|
|Gold Cup qualific...|Trinidad and Tobago|       9|
|AFC Asian Cup qua...|             Bhutan|       7|
|African Nations C...|              Gabon|       2|
|            Friendly|           China PR|     170|
|FIFA World Cup qu...|             Israel|      59|
|FIFA World Cup qu...|              Japan|      61|
|UEFA Euro qualifi...|            Romania|      62|
|AFC Asian Cup qua...|              Macau|       9|
|            Friendly|        South Sudan|       1|
|CONCACAF Nations ...|           Suriname|       3|
|         Copa Newton|          Argentina|      12|
|            Friendly|        Philippines|      38|
|FIFA World Cup qu...|              Chile|      68|
|African Cup of Na...|         Madagascar|      29|
|FIFA World Cup qu...|       Burkina Faso|      30|
| UEFA Nations League|            Denmark|       4|
|        Atlantic Cup|           Paraguay|       2|
+--------------------+-------------------+--------+

Explanation: The meaning of the first row is there were 11 Friendly tournaments held on Southern Rhodesia in total.

Note: Here it's mandatory to use a counter column in this case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.