Logo
Englika

How best to store date ranges in PostgreSQL

How best to store date ranges in PostgreSQL

In some cases, it is necessary to store a range of dates in the database (for example, hotel room reservations, calendar events). Subsequently, it is usually required to find records in the database whose date range intersects with the specified one. For example, return all events in the calendar that intersect with today.

The date range can be stored in 2 ways:

  1. Store the start and end dates in two columns.
  2. Store a date range in one column.

In this article, we will compare both ways and determine which one will have a higher search speed. As an example, we will store events that have a start and end dates. Our goal is to find all the events that intersect with the specified range as quickly as possible, so that in the future they can be shown in the calendar (events for the specified day, events for this week).

How to store a date range in one column

In PostgreSQL, there are range types that can be used to store ranges of values of a certain data type, which is called the range's subtype. This can be a range of dates for storing events in the calendar (subtype – timestamp), a range of numbers for storing instrument measurement ranges (subtype – numeric), etc. There are also mutirange types in which several ranges can be stored at once (for example, to save a meeting that will last from 11:00 to 12:00 and from 14:00 to 16:00).

The following types are used to store date ranges:

  • tsrange – the date range with the timestamp with time zone subtype. The multirange type is tsmultirange.
  • tstzrange – the date range with the timestamp with time zone subtype. The multirange type is tstzmultirange.
  • daterange – the date range with the date subtype. The multirange type is datemultirange.

The range is set as follows:

-- The meeting is scheduled for January 1, 2022 and will last from 10:00 to 12:00
SELECT '[2022-01-01 10:00, 2022-01-01 12:00]'::tsrange;

If you need to store several ranges at once, then you need to use the multirange type:

-- The meeting is scheduled for January 1, 2022 and will last from 11:00 to 12:00 and from 14:00 to 16:00
SELECT '{[2022-01-01 11:00, 2022-01-01 12:00], [2022-01-01 14:00, 2022-01-01 16:00]}'::tsmultirange;

When specifying the range, square brackets were used, which mean that the boundary values are included in this range. If a value from the lower and/or upper bound needs to be excluded from the range, then use parentheses.

-- January 1, 2022 from 8 to 9 am, including 8 am and excluding 9 am
SELECT '[2022-01-01 08:00, 2022-01-01 09:00)'::tsrange;

You can specify only the start or the end of the range:

SELECT '(, 2022-01-01 10:00]'::tsrange; -- Until January 1, 2022 10 am
SELECT '[2022-01-01 10:00,)'::tsrange; -- After January 1, 2022 10 am
SELECT '(,)'::tsrange; -- A range that includes all dates

Read more about the range types in the official documentation.

Indexes that supported date ranges

To speed up the search by ranges, you can use the GiST index or SP-GiST. Multirange types are supported only by the GiST index. You can read more about these indexes here and here.

CREATE TABLE events (during tsrange);
CREATE INDEX events_during_idx ON events USING gist (during);

The GiST and SP-GiST indexes support the following operators:

= – whether the ranges are equal.

SELECT '(2022-01-01, 2022-01-05]'::daterange = '(2022-01-01, 2022-01-05]'::daterange; -- true

&& – whether the ranges overlap.

SELECT '[2022-01-01, 2022-01-05]'::daterange && '(2022-01-04, 2022-01-10]'::daterange; -- true

<@ – whether the first range is contained by the second.

SELECT '[2022-01-02, 2022-01-04]'::daterange <@ '(2022-01-01, 2022-01-05)'::daterange; -- true

<@ – whether the element is contained in the range.

SELECT '2022-01-03'::date <@ '[2022-01-02, 2022-01-04]'::daterange; -- true

@> – whether the second range is contained by the first.

SELECT '(2022-01-01, 2022-01-05)'::daterange @> '[2022-01-02, 2022-01-04]'::daterange; -- true

@> – whether the element is contained in the range.

SELECT '[2022-01-02, 2022-01-04]'::daterange @> '2022-01-03'::date; -- true

<< – whether the first range is strictly left of the second.

SELECT '[2022-01-01, 2022-01-03]'::daterange << '[2022-01-05, 2022-01-08]'::daterange; -- true

>> – whether the first range is strictly right of the second.

SELECT '[2022-01-05, 2022-01-08]'::daterange >> '[2022-01-01, 2022-01-03]'::daterange; -- true

-|- – whether the ranges are adjacent to each other.

SELECT '[2022-01-01, 2022-01-03]'::daterange -|- '[2022-01-04, 2022-01-06]'::daterange; -- true

&< – whether the first range not extend to the right of the second.

SELECT '[2022-01-01, 2022-01-10]'::daterange &< '[2022-01-05, 2022-01-15]'::daterange; -- true

&> – whether the first range not extend to the left of the second.

SELECT '[2022-01-05, 2022-01-15]'::daterange &> '[2022-01-01, 2022-01-10]'::daterange; -- true

Constraints for date ranges

Sometimes it is necessary to make sure that events in the calendar cannot overlap with each other. To do this, you need to set an exception constraint using the && operator to check the intersection.

CREATE TABLE events (during tsrange, EXCLUDE USING gist (during WITH &&));
INSERT INTO events VALUES ('[2022-01-01 10:00, 2022-01-01 11:00)');
INSERT INTO events VALUES ('[2022-01-01 10:59, 2022-01-01 12:00)');
ERROR:  conflicting key value violates exclusion constraint "events_during_excl"
DETAIL:  Key (during)=(["2022-01-01 10:59:00+00","2022-01-01 12:00:00+00")) conflicts with existing key (during)=(["2022-01-01 10:00:00+00","2022-01-01 11:00:00+00")).

Most likely, each user will have a calendar with their own events. For example, there are several masters in the beaury salon and visitors can sign up for them. At the same time, 2 people can sign up for the same time to different masters, but it is impossible to allow someone to sign up for the time when the master will be busy with another client.

To allow the intersection of events of different users, it is necessary to add an additional comparision for the equality of the user ID to the exclusion constraint. To do this, we need the btree_gist extension.

CREATE EXTENSION btree_gist;
CREATE TABLE events (user_id int, during tsrange, EXCLUDE USING gist (user_id WITH =, during WITH &&));
INSERT INTO events VALUES (1, '[2022-01-01 10:00, 2022-01-01 11:00)');
INSERT INTO events VALUES (2, '[2022-01-01 10:00, 2022-01-01 11:00)');
INSERT INTO events VALUES (2, '[2022-01-01 10:59, 2022-01-01 12:00)');
ERROR:  conflicting key value violates exclusion constraint "events_user_id_during_excl"
DETAIL:  Key (user_id, during)=(2, ["2022-01-01 10:59:00+00","2022-01-01 12:00:00+00")) conflicts with existing key (user_id, during)=(2, ["2022-01-01 10:00:00+00","2022-01-01 11:00:00+00")).

Comparision of indexes

To search for events that overlap with the specified range (for example, to show today events in the calendar), we can use one of 3 options:

  1. Store the start and end dates. Use the B-tree index to speed up the search.
  2. Store the date range. Use the GiST index.
  3. Store the date range. Use the SP-GiST index.

For each case, we will create a separate table, add 1 million rows with the same date ranges to each, and create indexes.

-- Create tables
CREATE TABLE events_two_dates (start_date timestamp, end_date timestamp);
CREATE TABLE events_range_gist (during tsrange);
CREATE TABLE events_range_spgist (during tsrange);

-- Generate date ranges and fill in the first table
INSERT INTO events_two_dates SELECT i, i + INTERVAL '10 MINUTES' FROM (SELECT '2022-01-01'::timestamp + random() * INTERVAL '1 YEAR' as i FROM generate_series(1, 1000000)) AS t;

-- Copy date ranges from the first table to the second and third
INSERT INTO events_range_gist SELECT tsrange(start_date, end_date) FROM events_two_dates;
INSERT INTO events_range_spgist SELECT tsrange(start_date, end_date) FROM events_two_dates;

-- Create indexes
CREATE INDEX events_two_dates_idx ON events_two_dates (start_date, end_date);
CREATE INDEX events_range_gist_idx ON events_range_gist USING gist (during);
CREATE INDEX events_range_spgist_idx ON events_range_spgist USING spgist (during);

When generating ranges, initial dates are generated first, which can take values from January 1, 2022 (inclusive) to January 1, 2023 (exclusive), and then date ranges with a duration of 10 minutes are created based on them.

To test queries, we will search for events that intersect with January 5, 2022 from 8 to 9 am. There are 146 such events.

EXPLAIN ANALYZE SELECT * FROM events_two_dates WHERE start_date <= '2022-01-05 09:00'::timestamp AND end_date >= '2022-01-05 08:00'::timestamp;
 Index Only Scan using events_two_dates_idx on events_two_dates  (cost=0.42..429.94 rows=11805 width=16) (actual time=7.253..8.629 rows=146 loops=1)
   Index Cond: ((start_date <= '2022-01-05 09:00:00+00'::timestamp with time zone) AND (end_date >= '2022-01-05 08:00:00+00'::timestamp with time zone))
   Heap Fetches: 0
 Planning Time: 7.594 ms
 Execution Time: 10.156 ms

The size of the events_two_dates_idx index is 30 MB.

EXPLAIN ANALYZE SELECT * FROM events_range_gist WHERE during && '[2022-01-05 08:00, 2022-01-05 09:00]'::tsrange;
 Index Only Scan using events_range_gist_idx on events_range_gist  (cost=0.29..10.89 rows=149 width=22) (actual time=0.894..2.350 rows=146 loops=1)
   Index Cond: (during && '["2022-01-05 08:00:00+00","2022-01-05 09:00:00+00"]'::tsrange)
   Heap Fetches: 0
 Planning Time: 8.629 ms
 Execution Time: 3.604 ms

The size of the events_range_gist_idx index is 58 MB.

EXPLAIN ANALYZE SELECT * FROM events_range_spgist WHERE during && '[2022-01-05 08:00, 2022-01-05 09:00]'::tsrange;
 Index Only Scan using events_range_spgist_idx on events_range_spgist  (cost=0.29..10.77 rows=142 width=22) (actual time=0.944..2.364 rows=146 loops=1)
   Index Cond: (during && '["2022-01-05 08:00:00+00","2022-01-05 09:00:00+00"]'::tsrange)
   Heap Fetches: 0
 Planning Time: 5.096 ms
 Execution Time: 3.670 ms

The size of the events_range_spgist_idx index is 61 MB.

Conclusion

The search using range types is performed approximately 3 times faster compated to using the standard approach in the form of 2 dates (start and end dates). However, GiST and SP-GiST indexes, which are used for searching by range types, take up 2 times more disk space.

Using multirange types (only the GiST index is supported), multiple ranges can be stored in one column at once. For example, to save a meeting that will last from 11:00 to 12:00 and from 14:00 to 16:00.

As a result, it is better to use range types to store date ranges and use the GiST index to speed up the search if the disk space is not so important.

Cheat sheet

-- Storing a date range in one column
CREATE TABLE events (during tsrange);

-- Create the GiST index to speed up the search
CREATE INDEX events_during_idx ON events USING gist (during);

-- Use the && operator to search for ranges that intersect with the specified one
SELECT * FROM events WHERE during && '[2022-01-05 08:00, 2022-01-05 09:00]'::tsrange;-- Встреча назначена на 1 января 2022 и будет идти с 10 до 12
SELECT '[2022-01-01 10:00, 2022-01-01 12:00]'::tsrange;

Related posts

How to get N rows per group in SQL

Let's assume that you are developing the home page in an online store. This page should display product categories with 10 products in each. How do you make a query to the database? What indexes do you need to create to speed up query execution?

How to get N rows per group in SQL