Like the image above, a subquery is a query written inside another query. It is a nested query that can retrieve data from several tables. It can also be used as a special filter based on the results of other queries.
The SQL execution order usually starts from the subquery, the nested query, then the parent query.
In this article, we will look at the use cases of subqueries and how you can use them to make a big difference.
SCENERIO 1:
Company A was trying to upskill her employees and asked them to purchase a course with an offer of a 60% discount. Let’s say we have two tables, “Employee” and “Order”.
You want to see employees that have already made an order or purchased. This can be achieved using a subquery.
SELECT employee_name
FROM employee
WHERE employee_id IN (SELECT employee_id FROM order);
In the example above, SQL executes the subquery first which is the ‘SELECT employee_id FROM order’. The subquery retrieves the employee IDs from the Order table. The second order of execution is the parent or outer query which is ‘SELECT employee_name FROM employee’.
The parent query fetches only the names of employees from the Employee table that have already purchased the course by using the employee_id from the Order table.
Furthermore, we also want to see employees who earn more than the average salary in the organization.
SELECT employee_name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = employees.department_id
);
SCENERIO 2:
As a data analyst in a distribution company, you want to check for products being ordered and products a customer already purchased. A subquery can be used to derive this.
SELECT product_name
FROM products
WHERE product_id IN (
SELECT product_id
FROM order_details
WHERE order_id IN (
SELECT order_id
FROM orders
WHERE customer_id = 10
)
);
The SQL execution order runs ‘SELECT order_id FROM orders’ then runs ‘SELECT product_id FROM order_details’ given where order_id exists in the Orders table and finally, run the parent query by retrieving the product names from the product table where the condition stated in the above query is satisfied.
SCENERIO 3:
A new product known as Maggie was launched and you want to see customers who have placed more orders than the average number of orders per customer for the new product.
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > (SELECT AVG(order_count) FROM (
SELECT customer_id, COUNT(*) AS order_count
FROM orders
WHERE product = 'Maggie'
GROUP BY customer_id
) AS avg_orders);
However, there can be complex SQL queries, and using a subquery might not be the best option. CTE, known as Common Table Expression, is an advanced SQL trick that can help you better and allows for query optimization.
CTEs in this case, make your query easier to read and understand at first glance hence why I’d prefer CTEs rather than using multiple nested queries.
Let’s look at a use case where you are to join multiple tables.
We will be making use of subquery in this scenario:
SELECT A.id, A.name, B.product, B.amount, C.country, C.state, C.lga, from (
SELECT id, name
FROM users
WHERE tbl_dt = '20230510'
) table_a A
LEFT JOIN
(
SELECT product, amount
FROM orders
WHERE total>100 and tbl_dt = '20230510'
) table_b B
ON A.user_id = B.id
LEFT JOIN
(
SELECT country, state, lga
FROM region
WHERE tbl_dt = '20230510'
) table_c C
ON A.user_id = C.id
Instead of the above query, a CTE can also be used to achieve this.
WITH table_a AS (
SELECT id, name
FROM users
WHERE tbl_dt = '20230510'
),
table_b AS (
SELECT product,amount
FROM orders
WHERE total>100 and tbl_dt = '20230510'
),
table_c AS(
SELECT country, state, lga
FROM region
WHERE tbl_dt = '20230510'
)
SELECT A .id, A.name, B.product, B.amount, C.country, C.state, C.lga, from
FROM table_a A
LEFT JOIN table_b B ON A.user_id = B.id
LEFT JOIN table_c C ON A.user_id = C.id
As seen in the above query, it can be easily read and allows for optimization.
CONCLUSION: A subquery is a powerful tool that allows us to retrieve specified data from different tables, extract complex insights, and filter and aggregate data dynamically which also empowers data professionals.
However, maximizing the use of subqueries and considering their performance implications is important, especially with large datasets.
Finally, it is essential to note that it is best to focus on the fundamentals and the application of your knowledge by working on projects. With constant practice, you get better and understand how it is being used.