Understanding Subqueries in PostgreSQL

Subqueries, also known as nested queries, are queries embedded within other SQL queries.

In PostgreSQL, subqueries are a powerful tool to help filter, manipulate, and aggregate data dynamically. They allow you to perform complex data retrieval operations without the need for temporary tables or joins in some cases. A subquery can return a single value, a list of values, or a set of rows, and it can be used in various clauses, such as SELECT, WHERE, FROM, and HAVING.

Table of Contents

Types of Subqueries in PostgreSQL

Subqueries in the WHERE Clause
Subqueries in the FROM Clause
Subqueries in the SELECT Clause
Subqueries in the HAVING Clause
Subqueries in UPDATE or DELETE Statements

Let’s explore each type in detail with examples.

Subqueries in the `WHERE` Clause

A subquery in the WHERE clause is typically used to filter records based on the results of another query. This is one of the most common uses of subqueries in PostgreSQL.

Get Your Free Linux training!

Join our free Linux training and discover the power of open-source technology. Enhance your skills and boost your career! Learn Linux for Free!

Example 1: Employees with Salary Above Department Average

Let’s say we have two tables: employees and departments, and we want to find employees whose salary is greater than the average salary in their department.

SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = 2
);

Explanation:

The subquery calculates the average salary for employees in department 2:
SELECT AVG(salary) FROM employees WHERE department_id = 2.
The outer query retrieves employees whose salary is greater than this average.

Example 2: Employees Involved in Projects

If you have an employees table and a projects table, and you want to find all employees who are assigned to at least one project:

SELECT name
FROM employees
WHERE id IN (
    SELECT employee_id
    FROM projects
);

Explanation:

The subquery returns the list of employee_id values that appear in the projects table (i.e., employees involved in projects).
The outer query retrieves all employees whose id is in the list returned by the subquery.

Subqueries in the `FROM` Clause

Subqueries in the FROM clause allow you to use the result of a subquery as if it were a table. This is useful when you need to aggregate or filter data before joining with other tables.

Example 3: Highest Salary in Each Department

Suppose you have a table employees and you want to find the highest salary in each department. Here’s how to do it using a subquery in the FROM clause:

SELECT d.department_name, e.max_salary
FROM departments d
JOIN (
    SELECT department_id, MAX(salary) AS max_salary
    FROM employees
    GROUP BY department_id
) e ON d.id = e.department_id;

Explanation:

The subquery (SELECT department_id, MAX(salary) ...) finds the maximum salary for each department.
The outer query then joins the result of the subquery (e) with the departments table to show the department name along with the highest salary.

Subqueries in the `SELECT` Clause

You can use a subquery in the SELECT clause to return a calculated value for each row in the result set.

Example 4: Employee Age with Department Average Age

If you want to retrieve each employee’s age alongside the average age of employees in their department, you can use a subquery in the SELECT clause:

SELECT e.name, e.age,
       (SELECT AVG(age) 
        FROM employees 
        WHERE department_id = e.department_id) AS avg_department_age
FROM employees e;

Explanation:

The subquery (SELECT AVG(age) FROM employees WHERE department_id = e.department_id) calculates the average age for each department dynamically.
The outer query retrieves each employee’s name, age, and the average age in their department.

Subqueries in the `HAVING` Clause

Subqueries in the HAVING clause allow you to filter groups after aggregation. This can be useful when you want to apply conditions on aggregated data, based on values from another subquery.

Example 5: Departments with More than 5 Employees

If you want to find departments that have more than 5 employees, you can use a subquery in the HAVING clause:

SELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > (
    SELECT 5
);

Explanation:

The subquery (SELECT 5) returns the constant value 5, which is used as a comparison.
The outer query groups employees by department_id and filters the departments where the employee count exceeds 5.

Subqueries in `UPDATE` or `DELETE` Statements

You can also use subqueries in UPDATE or DELETE statements to modify or remove data based on conditions calculated by another query.

Example 6: Update Employee Salaries Based on Average Salary

Let’s say you want to give a 10% raise to employees whose salary is below the average salary in their department.

UPDATE employees
SET salary = salary * 1.10
WHERE department_id = 3
AND salary < (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = 3
);

Explanation:

The subquery (SELECT AVG(salary) FROM employees WHERE department_id = 3) calculates the average salary for department 3.
The UPDATE query applies a 10% raise to employees whose salary is below the average for their department.

Example 7: Delete Employees Earning Below Department Average

If you want to delete employees who earn less than the average salary in their department, you can use a subquery like this:

DELETE FROM employees
WHERE salary < (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = employees.department_id
);

Explanation:

The subquery (SELECT AVG(salary) FROM employees WHERE department_id = employees.department_id) calculates the average salary for each department.
The outer query deletes employees whose salary is less than the average salary in their department.

Summary

Subqueries are an essential tool in PostgreSQL for performing complex queries and data manipulation without the need for intermediate tables. They can be used in various SQL clauses, including WHERE, FROM, SELECT, and HAVING, to filter, aggregate, and calculate data in more sophisticated ways. Here’s a quick recap of the types of subqueries:

In the WHERE Clause: Filter records based on conditions derived from another query.
In the FROM Clause: Treat the result of a subquery as a temporary table for joins.
In the SELECT Clause: Calculate values dynamically for each row in the result.
In the HAVING Clause: Filter aggregated data after grouping.
In UPDATE or DELETE: Modify or remove data based on conditions calculated from subqueries.

By using subqueries effectively, you can write more efficient and readable SQL queries in PostgreSQL, improving both the performance and clarity of your database operations.

Types of Subqueries in PostgreSQL

Subqueries in the WHERE Clause

Example 1: Employees with Salary Above Department Average

Example 2: Employees Involved in Projects

Subqueries in the FROM Clause

Example 3: Highest Salary in Each Department

Subqueries in the SELECT Clause

Example 4: Employee Age with Department Average Age

Subqueries in the HAVING Clause

Example 5: Departments with More than 5 Employees

Subqueries in UPDATE or DELETE Statements

Example 6: Update Employee Salaries Based on Average Salary

Example 7: Delete Employees Earning Below Department Average

Summary

Subqueries in the `WHERE` Clause

Subqueries in the `FROM` Clause

Subqueries in the `SELECT` Clause

Subqueries in the `HAVING` Clause

Subqueries in `UPDATE` or `DELETE` Statements