In PostgreSQL, partitioning is a technique used to manage large tables by splitting them into smaller, more manageable pieces.
This helps to improve query performance, manageability, and maintenance operations. There are two main types of partitioning: vertical partitioning and horizontal partitioning. Let’s explore both concepts.
Table of Contents
1. Horizontal Partitioning
Horizontal partitioning involves dividing a table into smaller, more manageable tables based on rows, where each partition stores a subset of the rows. This is usually done based on some range or list of values from one or more columns. It helps improve query performance by limiting the number of rows to scan during queries, reducing I/O and making data more manageable.
Key Concepts:
- Partition Key: The column(s) used to determine how rows are distributed across partitions.
- Partition Types:
- Range Partitioning: Rows are divided based on a range of values (e.g., dates, numeric values).
- List Partitioning: Rows are divided based on specific values of a column (e.g., categorical data like countries, regions).
- Hash Partitioning: Rows are divided based on a hash function applied to a column’s values, ensuring an even distribution across partitions.
Example of Horizontal Partitioning:
Let’s say we have a sales
table, and we want to partition it by year (range partitioning).
Get Your Linux Course!
Join our Linux Course and discover the power of open-source technology. Enhance your skills and boost your career! Learn Linux today!CREATE TABLE sales (
id serial PRIMARY KEY,
sale_date DATE,
amount DECIMAL
) PARTITION BY RANGE (sale_date);
CREATE TABLE sales_2020 PARTITION OF sales FOR VALUES FROM ('2020-01-01') TO ('2021-01-01');
CREATE TABLE sales_2021 PARTITION OF sales FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');
CREATE TABLE sales_2022 PARTITION OF sales FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');
In this example, the sales
table is partitioned by the sale_date
column, and each partition contains sales for a particular year.
Benefits of Horizontal Partitioning:
- Improved Performance: Queries that access only a subset of the data (e.g., a specific year) will be more efficient as only relevant partitions will be scanned.
- Easier Maintenance: Each partition can be independently managed, backed up, or indexed.
- Better Query Planning: PostgreSQL can use partition pruning to skip partitions that are irrelevant for a given query, reducing the query cost.
2. Vertical Partitioning
Vertical partitioning, on the other hand, involves dividing a table into smaller tables based on columns.
Each partition stores a subset of the columns of the original table, typically used to optimize the performance of certain queries, especially when working with wide tables with many columns.
Vertical partitioning is generally used in cases where some columns are frequently queried together, and others are seldom used, allowing you to optimize storage and improve cache efficiency.
Key Concepts:
- Column Splitting: Splitting the columns of a table into separate physical tables, where each table contains a subset of the columns.
- Benefits: Reduced I/O for queries that only need a subset of columns, improved cache locality for frequently accessed columns, and better compression for tables with many unused or NULL-filled columns.
Example of Vertical Partitioning:
Let’s assume we have a customer
table with many columns, but some of them are rarely accessed together. You could split this table into two smaller tables.
-- Original table
CREATE TABLE customer (
id serial PRIMARY KEY,
name TEXT,
email TEXT,
phone_number TEXT,
address TEXT,
loyalty_points INT
);
-- Vertical partitioning: Split into two tables
CREATE TABLE customer_basic_info (
id serial PRIMARY KEY,
name TEXT,
email TEXT,
phone_number TEXT
);
CREATE TABLE customer_loyalty (
id serial PRIMARY KEY,
loyalty_points INT,
address TEXT
);
Here, we’ve created two smaller tables: one for basic information (name, email, and phone) and one for loyalty points and address. Queries that only need basic customer info will be faster because they’ll only access the customer_basic_info
table.
Benefits of Vertical Partitioning:
- Improved Query Performance: For queries that need only a subset of columns, you avoid reading unnecessary columns from disk, saving I/O.
- Better Caching: Smaller tables can fit in memory better, improving cache efficiency.
- Compression: Columns with different characteristics (e.g., some are highly nullable or contain repeated values) might compress better when stored separately.
Considerations:
- Vertical partitioning in PostgreSQL doesn’t have built-in support like horizontal partitioning. You typically need to manage it manually by splitting the table into smaller tables or using views.
- Queries often need to join the vertically partitioned tables together, so this can lead to increased complexity in the application layer or query planning.
Differences Between Vertical and Horizontal Partitioning
Aspect | Horizontal Partitioning (Row-based) | Vertical Partitioning (Column-based) |
---|---|---|
Data Division | Divides data into smaller tables based on rows. | Divides data into smaller tables based on columns. |
Partitioning Criteria | Typically based on ranges, lists, or hash of column values. | Based on columns that are often accessed together. |
Use Case | Large tables with a lot of data (time-series, logs, etc.). | Wide tables with many columns (selectively accessed). |
Query Performance | Improves performance by limiting the number of rows scanned. | Improves performance by limiting the number of columns read. |
Indexing | Each partition can have its own indexes. | Indexing is on a per-table basis for the vertically split tables. |
Implementation Complexity | Relatively simple to implement with PostgreSQL partitioning. | Requires more manual effort and potentially complex joins. |
Conclusion
Both horizontal and vertical partitioning offer distinct benefits for managing large datasets in PostgreSQL:
- Horizontal Partitioning is best for handling large datasets where queries can benefit from pruning irrelevant rows (e.g., partitioning by date, geographic region).
- Vertical Partitioning can be useful for optimizing queries that only need a subset of columns, or when dealing with wide tables where not all columns are frequently accessed together.
PostgreSQL’s built-in support for horizontal partitioning makes it easier to implement, while vertical partitioning usually requires more manual effort or custom table designs.