PostgreSQL index not used for range query

Question

PostgreSQL index not used for range query

I am using PostgreSQL (9.2.0) and have an IP range table. Here's the SQL:

CREATE TABLE ips ( id serial NOT NULL, begin_ip_num bigint, end_ip_num bigint, country_name character varying(255), CONSTRAINT ips_pkey PRIMARY KEY (id ) )

I added indexes on both begin_ip_num and end_ip_num :

 CREATE INDEX index_ips_on_begin_ip_num ON ips USING btree (begin_ip_num ); CREATE INDEX index_ips_on_end_ip_num ON ips USING btree (end_ip_num );

Used query:

 SELECT "ips".* FROM "ips" WHERE (3065106743 BETWEEN begin_ip_num AND end_ip_num);

The problem is that my BETWEEN query only uses the index on begin_ip_num . After using the index, it filters the result using end_ip_num . Here is the result of EXPLAIN ANALYZE :

 Index Scan using index_ips_on_begin_ip_num on ips (cost=0.00..2173.83 rows=27136 width=76) (actual time=16.349..16.350 rows=1 loops=1) Index Cond: (3065106743::bigint >= begin_ip_num) Filter: (3065106743::bigint <= end_ip_num) Rows Removed by Filter: 47596 Total runtime: 16.425 ms

I have already tried various combinations of indexes, including adding a composite index for both begin_ip_num and end_ip_num .

+8

indexing postgresql database-design between

Zain zafar Jan 18 '13 at 21:13

source share

4 answers

I'm a little late for this party, but for me it works very well.

Consider installing the ip4r extension . This basically allows you to define a column that can contain IP ranges. The name of the extension implies that it is for IPv4 only, but currently it also supports IPv6.

After filling the table with the ranges inside this column, all you need to do is create a GIST index:

 CREATE INDEX ip_zip_ip4_range ON ip_zip USING gist (ip4_range);

I have almost 10 million ranges in my database, but queries take part of a millisecond:

 region=> select count(*) from ip_zip ; count --------- 9566133 region=> explain analyze select * from ip_zip where '8.8.8.8'::ip4 <<= ip4_range; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on ip_zip (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.085..0.086 rows=1 loops=1) Recheck Cond: ('8.8.8.8'::ip4r <<= ip4_range) Heap Blocks: exact=1 -> Bitmap Index Scan on ip_zip_ip4_range (cost=0.00..232.16 rows=9566 width=0) (actual time=0.055..0.055 rows=1 loops=1) Index Cond: ('8.8.8.8'::ip4r <<= ip4_range) Planning time: 0.106 ms Execution time: 0.118 ms (7 rows) region=> explain analyze select * from ip_zip where '254.50.22.54'::ip4 <<= ip4_range; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on ip_zip (cost=234.55..25681.29 rows=9566 width=22) (actual time=0.059..0.059 rows=1 loops=1) Recheck Cond: ('254.50.22.54'::ip4r <<= ip4_range) Heap Blocks: exact=1 -> Bitmap Index Scan on ip_zip_ip4_range (cost=0.00..232.16 rows=9566 width=0) (actual time=0.048..0.048 rows=1 loops=1) Index Cond: ('254.50.22.54'::ip4r <<= ip4_range) Planning time: 0.102 ms Execution time: 0.145 ms (7 rows)

+4

Derek May 13, '15 at 18:48

source share

I had exactly the same problem on an almost identical dataset from the free geops maxmind.com table. I solved this using Erwin's advice on range types and GiST indices. The GiST Index was the key. Without this, at best, I requested about 3 rows per second. With this, I requested about 500,000 rows in less than 10 seconds! Since Erwin did not publish detailed instructions on how to do this, I thought I would add them here ...

First of all, you should add a new column with a range type, note that int8range is required for bigint types. Then set its values accordingly, note that the [[] parameter specifies to include the range in the lower and upper bounds (rtfm). Finally, add an index, note that the GiST index is where all the performance benefits come from.

 alter table ips add column iprange int8range; update ips set iprange=int8range(begin_ip_num, end_ip_num, '[]'); create index index_ips_on_iprange on ips using gist (iprange);

Having laid the foundation, you can now use the '<@' operator to search for specific addresses in the table. See http://www.postgresql.org/docs/9.2/static/functions-range.html

 SELECT "ips".* FROM "ips" WHERE (3065106743::bigint <@ iprange);

+3

pbnelson Mar 26 '14 at 20:18

source share

I believe your query looks like WHERE [constant] BETWEEN begin_ip_num AND end_ipnum or

As far as I know, Postgres does not have an AND-EQUAL access plan, so you need to add a 2-column composite index, as suggested by Erwin Brandstetter .

0

a1ex07 Jan 18 '13 at 21:32

source share

Erwin brandstetter · Accepted Answer · 2013-01-18T21:21:18+0000

Try using a multi- column index , but with the reverse order in the second column:

 CREATE INDEX index_ips_begin_end_ip_num ON ips (begin_ip_num, end_ip_num DESC);

An order is basically irrelevant for a single-column index, as it can be scanned back almost as fast. But this is important for multi-column indexes.

Using the index I suggest, Postgres can scan the first column and find the address where the rest of the index fulfills the first condition. Then, for each value of the first column, it can return all rows that satisfy the second condition, until the first is completed. Then move on to the next value of the first column, etc.
This is still not very efficient, and Postgres can scan the first column of the index faster and filter for the second. A lot depends on your data distribution.

What really helps here is the GiST index for the int8range column , available with PostgreSQL 9.2.

If this is not the case, you can check out this close answer on dba.SE with a rather complicated mode with partial indexes. Advanced material, but it provides excellent performance.

In any case, CLUSTER using a multi-column index at the top can help performance:

 CLUSTER ips USING index_ips_begin_end_ip_num

In this way, candidates fulfilling your first condition are packed on the same or adjacent data pages. It can help a lot in performance if you have many rows per value of the first column. Otherwise, it is hardly effective.

In addition, does auto-vacuum work or did you run ANALYZE on the table? You need current statistics for Postgres to select the appropriate query plans.

PostgreSQL index not used for range query

More articles: