Table layout
Revise the table and save the hours of operation (hours of operation) as a tsrange set ( tsrange range without a time zone) . Requires Postgres 9.2 or later .
Choose a random week to start your business hours. I like the week:
1996-01-01 (Monday) - 1996-01-07 (Sunday)
This is the last leap year when January 1 is usually Monday. But it can be any random week for this occasion. Just be consistent.
First install the optional btree_gist module. Why?
CREATE EXTENSION btree_gist;
Create the table as follows:
CREATE TABLE hoo ( hoo_id serial PRIMARY KEY , shop_id int NOT NULL REFERENCES shop(shop_id) -- reference to shop , hours tsrange NOT NULL , CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&) , CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours)) , CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange '[1996-01-01 0:0, 1996-01-08 0:0]') );
One hours column replaces all your columns:
opens_on, closes_on, opens_at, closes_at
For example, business hours from Wednesday, 18:30 - Thursday, 05:00 UTC are entered as:
'[1996-01-03 18:30, 1996-01-04 05:00]'
The hoo_no_overlap exception hoo_no_overlap prevents store entries from overlapping. It is implemented with the GiST index, which also supports your request. Consider the Index and Performance chapter below, discussing the indexing strategy.
The hoo_bounds_inclusive validation hoo_bounds_inclusive provides the inclusion of constraints for your ranges with two notable consequences:
- A point in time that falls on the lower or upper boundary is always turned on.
- Adjacent entries for one store are virtually prohibited. With boundaries enabled, they will “overlap,” and the exception constraint will throw an exception. Adjacent records must be combined on one line. Except when they end around midnight on Sunday, in which case they need to be split into two parts. See Tool 2. below.
The check hoo_standard_week sets the outer bounds of the intermediate week with the "range contained in" operator <@ .
With the borders turned on, you should observe a special / corner case where time goes around around midnight on Sunday:
'1996-01-01 00:00+0' = '1996-01-08 00:00+0' Mon 00:00 = Sun 24:00 (= next Mon 00:00)
You need to search both tags at the same time. Here is a related case with an exclusive upper bound that would not show this flaw:
- Prevent contiguous / overlapping records with EXCLUDE in PostgreSQL
f_hoo_time(timestamptz) function
To "normalize" any given timestamp with time zone :
CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz) RETURNS timestamp AS $func$ SELECT date '1996-01-01' + ($1 AT TIME ZONE 'UTC' - date_trunc('week', $1 AT TIME ZONE 'UTC')) $func$ LANGUAGE sql IMMUTABLE;
The function takes timestamptz and returns a timestamp . It adds the elapsed interval of the corresponding week ($1 - date_trunc('week', $1) in UTC time (!) To the starting point of our intermediate week. ( date + interval produces a timestamp .)
f_hoo_hours(timestamptz, timestamptz) function f_hoo_hours(timestamptz, timestamptz)
To normalize ranges and divide them by crossing Mon 00:00. This function takes any interval (like two timestamptz ) and produces one or two normalized tsrange values. It covers any legal input and prohibits the rest:
CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz) RETURNS TABLE (hoo_hours tsrange) AS $func$ DECLARE ts_from timestamp := f_hoo_time(_from); ts_to timestamp := f_hoo_time(_to); BEGIN -- test input for sanity (optional) IF _to <= _from THEN RAISE EXCEPTION '%', '_to must be later than _from!'; ELSIF _to > _from + interval '1 week' THEN RAISE EXCEPTION '%', 'Interval cannot span more than a week!'; END IF; IF ts_from > ts_to THEN -- split range at Mon 00:00 RETURN QUERY VALUES (tsrange('1996-01-01 0:0', ts_to , '[]')) , (tsrange(ts_from, '1996-01-08 0:0', '[]')); ELSE -- simple case: range in standard week hoo_hours := tsrange(ts_from, ts_to, '[]'); RETURN NEXT; END IF; RETURN; END $func$ LANGUAGE plpgsql IMMUTABLE COST 1000 ROWS 1;
INSERT one line of input:
INSERT INTO hoo(shop_id, hours) SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');
This results in two lines if the range requires separation at Mon 00:00.
INSERT few lines of input:
INSERT INTO hoo(shop_id, hours) SELECT id, hours FROM ( VALUES (7, timestamp '2016-01-11 00:00', timestamp '2016-01-11 08:00') , (8, '2016-01-11 00:00', '2016-01-11 08:00') ) t(id, f, t), f_hoo_hours(f, t) hours; -- LATERAL join
About the implicit LATERAL :
- What is the difference between LATERAL and a subquery in PostgreSQL?
Query
With a customized design, your entire complex, expensive request can be replaced with ... this:
SELECT *
FROM hoo
WHERE hours @> f_hoo_time(now());
For a little suspension, I place a spoiler plate over the solution. Move the mouse over .
The query is supported by the specified GiST index and is fast, even for large tables.
SQL Fiddle (with lots of examples).
If you want to calculate the total working time (per store), here is the recipe:
- Calculate working hours between two dates in PostgreSQL
Index and Performance
the containment operator for range types can be supported using GiST or SP-GiST . Anyone can be used to implement an exception restriction, but only GiST supports multi-column indexes :
Currently, only B-tree, GiST, GIN, and BRIN indexes support multi-column indexes.
And the order of the index columns matters :
The GiST multi-line index can be used with query conditions that include any subset of the index columns. Additional column conditions restrict the records returned by the index, but the condition in the first column is most important to determine how many indexes you want to scan. The GiST index will be relatively ineffective if its first column has only a few distinct values, even if the additional columns have many different values.
So, we have conflicting interests . For large tables, shop_id will be more other values for shop_id than for hours .
- The GiST index with the leading
shop_id writes faster and provides an exception restriction. - But we find the
hours column in our query. Will have this column first. - If we need to look for
shop_id in other requests, then for this a simple btree pointer is much faster. - To top it all off, I found the SP-GiST index only
hours for faster to query.
Benchmark
My script for creating dummy data:
INSERT INTO hoo(shop_id, hours) SELECT id, hours FROM generate_series(1, 30000) id, generate_series(0, 6) d , f_hoo_hours(((date '1996-01-01' + d) + interval '4h' + interval '15 min' * trunc(32 * random())) AT TIME ZONE 'UTC' , ((date '1996-01-01' + d) + interval '12h' + interval '15 min' * trunc(64 * random() * random())) AT TIME ZONE 'UTC') AS hours WHERE random() > .33;
Results in 141k randomly generated lines, 30k different shop_id , 12k different hours . (Usually the difference will be larger.) The size of the table is 8 MB.
I dropped and recreated the exception constraint:
ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id WITH =, hours WITH &&); -- 4.4 sec !! ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap EXCLUDE USING gist (hours WITH &&, shop_id WITH =); -- 16.4 sec
shop_id first ~ 4 times faster.
In addition, I tested two more for reading:
CREATE INDEX hoo_hours_gist_idx on hoo USING gist (hours); CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours); -- !!
After VACUUM FULL ANALYZE hoo; I performed two requests:
- Q1 : late night, finding only 53 lines
- Q2 : in the afternoon, finding 2423 lines .
results
A scan is obtained only by index for each (except for "without index", of course):
index idx size Q1 Q2 ------------------------------------------------ no index 41.24 ms 41.2 ms gist (shop_id, hours) 8MB 14.71 ms 33.3 ms gist (hours, shop_id) 12MB 0.37 ms 8.2 ms gist (hours) 11MB 0.34 ms 5.1 ms spgist (hours) 9MB 0.29 ms 2.0 ms -- !!
- SP-GiST and GiST are at the same level for queries that detect multiple results (GiST is even faster for very few).
- SP-GiST scales with more results and less.
If you read a lot more than you write (typical use case), keep the exception constraint as suggested at the beginning and create an additional SP-GiST index to optimize read performance.