Automatic reformulation of a condition in a PostgreSQL view

I have a table like this:

[mytable] id, min, max, funobj ----------------------------------------- 1 15 23 {some big object} 1 23 41 {another big object} 1 19 27 {next big object} 

Now suppose I have a view created like this:

 CREATE VIEW functionvalues AS SELECT id, evaluate(funobj) FROM mytable 

where evaluation is the function of returning a set evaluating large funobj. The result of a view might be something like the following:

 id, evaluate -------------- 1 15 1 16 1 ... 1 23 2 23 2 24 2 ... 2 41 ... 

I have no information about the specific values ​​that will be evaluated, but I know that they will always be between the minimum and maximum values ​​specified in mytable (including the bounds)

Finally, I (or better, a third-party application) makes a request in the view:

 SELECT * FROM functionvalues WHERE evaluate BETWEEN somevalue AND anothervalue 

In this case, Postgres evaluates the function score for each row in the mytable table, whereas depending on the where clause, the function should not be evaluated if it does not exceed min and min between these values. Since evaluation is a rather slow function, it gives me very poor performance.

It would be best to directly query the table using

 SELECT * FROM ( SELECT id, evaluate(funobj) FROM mytable WHERE max BETWEEN somevalue AND anothervalue OR min BETWEEN somevalue AND anothervalue OR (min < somevalue AND max > anothervalue) ) AS innerquery WHERE evaluate BETWEEN somevalue AND anothervalue 

Is it possible to use postgres to use such a query as stated above (using smart indexes or something like that) without changing the way a third-party application requests a view?

PS: Feel free to suggest a better heading for this question, the one I gave is more likely ... good ... non-specific.

+4
source share
3 answers

Postgres cannot supplant constraints in the query tree into a function; a function should always scan and return the entire base table. And back to the same table. sigh. The “breaking” of the function body and combining it with the rest of the request will require a macro-like function instead of a function.

The best way would probably be to not use an unlimited function of the returned set, but to rewrite the function as a scalar function, taking only one row of data as an argument and getting its value.

There is also the problem of sorting-ordering: the external query does not know about the order provided by the function, therefore explicit sorting and merging steps are necessary, except, perhaps, for very small sets of results (not available for statistics of the results of the function, only cost and estimated number, IIRC .)

0
source

I do not have a complete answer, but some of your slogans call me at the far bell:

  • You have an idea
  • you need a more intelligent look
  • you want to “rewrite” the definition of the view

This requires the PostgreSQL Rule System , especially the "Views and Rule System" part. Perhaps you can use this to your advantage.

Be warned: this is treacherous material. First you find it great, then you stroke it, then it will break your hand without warning, while it is still purring. Follow the links in here .

+1
source

Sometimes the correct answer is "faster hardware." Given how the PostgreSQL optimizer works, moving the table and its indexes to the solid state drive might be the best option.

Tablespace Documentation

Secondly, table spaces allow the administrator to use knowledge about the pattern of using database objects to optimize performance. For example, an index that is very heavily used, you can have a fast, highly accessible drive, such as an expensive solid state device. At the same time, a table storing archived data that is rarely used or not critical performance cannot be stored on a less expensive, slower disk system.

In October 2011, you can get a really good 128GB SSD for less than $ 300, or 300 gigs for less than $ 600.

If you are looking for a two-order improvement, and you already know that your evaluation function () is the bottleneck, you may have to accept lesser benefits from many sources. If I were you, I would see if any of these things help.

  • solid-state drive (acceleration factor of 50 on the HDD, but you say that you are not tied to IO, so give a rating of "2")
  • faster processor (acceleration 1.2)
  • more RAM (acceleration 1.02)
  • various algorithms in estimating () (say 2)
  • various data structures in the evaluation () (in the database function, possibly 0)
  • various optimizations of the C (0) compiler
  • another C compiler (1.1)
  • rewrite critical parts of evaluation () in assembler (2.5)
  • various dbms platform
  • technology of various databases

These estimates suggest an acceleration of only 13 times (but this is nothing more than guesswork).

I might even think about targeting a GPU for calculation.

0
source

All Articles