For this month’s PGSQL Phriday #011, Tomasz asked us to think about PostgreSQL partitioning vs. sharding. This is a topic near and dear to me and I’m excited to think about it some this month.
For 20+ years of database and application development, time-series data has always been at the heart of the products I work with. Inevitably, as the quantity of data grew over time, management became more difficult and query performance suffered. Over the years, the primary method for managing this growth in data effectively would be to partition it. The problem is, until recently, partitioning wasn’t easy to setup in most OLTP databases like PostgreSQL or SQL Server.
Fortunately, PostgreSQL has significantly improved its ability to partition large data tables over the last 6 years, starting with PostgreSQL 10.
Before we dive into the details of what partitioning is, how it can help with your data problems, and where sharding enters the picture, let’s start with some brief definitions.
Partitioning vs. Sharding
I appreciate how Tomasz specifically called out the differences between partitioning and sharding – and I think he’s right, there are times where it might seem like that they are the same thing. While related, they have two distinct purposes and applications.
For the purposes of this blog post:
Partitioning: Saving data into smaller individual tables, on the same server, based on a key and algorithm. These tables are then grouped together through a parent table which acts as a transparent entryway for queries, directing inserts and selects to the proper child table. Other features like indexes matter, but not for the purpose of the general definition.
Sharding: Distributing table data into partitions located on remote servers based on a key and algorithm. In some cases, there is a parent node that directs all data and query distribution. In other cases, technology might be in place which allows applications to “query any node” without relying on a single point of entry.
So, partitioning = local instance while sharding = distributed instances.
At least that’s how I see it.
Creating Partitions with PostgreSQL
In all, I’m a huge fan of partitioning in PostgreSQL. Whether native partitioning (starting with PostgreSQL 10) or extensions like TimescaleDB and Citus that provide other flavors of partitioning, the benefits provide real-world value to data management. Regardless of which solution you use (and I won’t get into the “inheritance isn’t true partitioning” argument), getting started is straightforward.
Let’s assume this is the parent table that we create for a new partitioned table. Below are very brief examples of how to make it a partitioned table using native partitioning or purpose-built extensions.
CREATE TABLE sensor_data ( id int not null, ts date not null, temp int, cpu int ) PARTITION BY RANGE (ts); -- don't add this for TimescaleDB
Native Partitioning:
CREATE TABLE sensor_data_y2023m07 PARTITION OF sensor_data FOR VALUES FROM ('2023-07-01') TO ('2023-08-01'); CREATE TABLE sensor_data_y2023m08 PARTITION OF sensor_data FOR VALUES FROM ('2023-08-01') TO ('2023-09-01'); ...
One limitation of native partitioning at this point is that partitions need to exist before data is created. On most major hosting platforms (and local installations), the go-to solution is to use pg_partman
to pre-create partitions as needed.
Still many projects successfully use native partitioning and it serves their needs well!
Timescale:
TimescaleDB is an extension that creates partitions through inheritance, but does it automatically as data comes in. There’s no need to pre-create partitions for the data. The terminology is a bit different (partitions are called chunks), but in essence the same thing is occurring. Because TimescaleDB focuses on time-series data, all partitioning keys need to have a timestamp or date column. This means that the additional partition options with Native partitioning (hash and list), are not currently supported by TimescaleDB.
Select create_hypertable('sensor_data','ts',chunk_time_interval=>'1 month'::interval);
Citus:
Citus is a bit of an outlier here, because it actually has some helper functions for creating the time-series partitions for you (which means you don’t need pg_partman
), but it can also shard the data across multiple servers. You can just use the partition helpers and get a few advantages similar to TimescaleDB, like columnar compression of older partitions.
After creating the partitioned table above in PostgreSQL, you can:
SELECT create_time_partitions( table_name := 'sensor_Data', partition_interval := '1 month', end_at := now() + '12 months' );
This will begin by creating 12 future partitions, and every month, it will create enough partitions to be 12 months out. This is certainly a bit easier than creating partitions by hand or even the added setup of pg_partman
.
Advantages of Partitioning
We have an ever-growing deluge of data. Without the ability to easily partition a table, recent data and index modifications wouldn’t be able to easily fit into memory, especially as the size of your data exceeds the total memory of the server.
This is one of the primary values of partitioning data. Breaking something larger (or very, very large) table into many smaller pieces, all while appearing to be a single table. Once things are broken down into these smaller pieces, index maintenance improves and current data has less contention for shared_buffers
.
Additionally, removing or archiving older data becomes simpler, too. Instead of working on one big table, with large indexes, removing partition data is as simple as dropping the child table. This is almost instant in most cases and there is no index maintenance involved.
If you have large amounts of data, particularly with a timestamp involved in some way (time-series data), partitioning sooner, rather than later, can help set you up for easier data management down the road… and often a more performant database over the long haul.
Partitioning Gotchas
There are a number of things to be aware of as you embark on partitioning in PostgreSQL. The two that I always try to discuss early on with folks deal with query performance.
Indexes are one of the advantages of partitioning in PostgreSQL because they end up being local to the child table. This keeps indexes smaller and localized when a query is executed. However, there is no global index for a partitioned table. Therefore, if you run a query that does not include the partitioning key, even if there is an index created for whatever predicate you used, PostgreSQL can’t use it to reduce the surface area it has to search.
The query planner can only prune child partitions that are excluded by a predicate if it includes the partition key. Otherwise, the query will have to touch every child table, using the index local to each partition to see if any data might exist there to satisfy the query.
This can often come as a surprise to users new to PostgreSQL partitioning.
Second, partitions aren’t free when it comes to query planning. The last few releases of PostgreSQL have improved query planning in numerous ways when partitions are involved. But just like the lack of a global index, PostgreSQL can’t prune partitions if the partitioning key isn’t used in the predicate. Therefore, planning time can go up significantly as the number of partitions increases.
What About Sharding?
Right. Well, I have some experience with it, but not as much as others. When it comes to PostgreSQL, sharding is typically provided through foreign data wrappers (FDW). Whether you set up sharding manually through sub-partitions or use an extension like Citus, queries are typically run through a leader node which then sends the queries to the additional nodes (shards) for retrieval (or data insertion).
In most cases, if your data requirements are at the point where sharding is necessary, I’d probably start with Citus (or Azure CosmosDB for PostgreSQL which is the hosted solution). Yugabyte is mostly PostgreSQL at the query layer (totally replaced storage) which has received a lot of attention lately. YMMV.
But for now, I’ll leave sharding at that, because I’m hoping others on this PGSQL Phirday #011 will address it at some level so that I can learn too!
Part(ition)ing Thoughts
If you’re early in your application journey and you know that some of the tables have a chance at getting very big some day, start partitioning early. This will allow you to learn about some of the gotchas when data isn’t overwhelming, adjust expectations, and modify the setup when things are smaller.
If you’re application is already long-lived but you know partitioning (or sharding) could benefit your workload, spend time talking with others that have transitioned to a partitioned table to see what issues they ran into. The issue is that you can’t just modify a regular table into a partitioned one. Making a partitioned table from existing data means you’ll have to create a new version of your table (with new name) partitioned in the appropriate way, copy all of the data into it (after you’ve ensured the partitions are available ahead of time), and then you’ll have to drop the old table and rename the new one.
While none of that is terribly difficult, it will take time and locks will be taken at various points, preventing the process from happening quickly and your application in a hung state.
Happy #PGSQLPhriday!!
I think you forgot to change the name of the first partition you create in the beginning of this blog, both are called sensor_data_y2023m08.
Right you are! Thank you. Fixed!