What does PostgreSQL explain to me?

MySQL explains the conclusion is quite simple. PostgreSQL is a bit more complicated. I, too, could not find a good resource that explains this.

Can you describe what exactly explains, speaks, or at least points me towards a good resource?

+81
postgresql
Sep 22 '08 at 20:08
source share
7 answers

Explaining_EXPLAIN.pdf can also help.

+49
Sep 23 '08 at 15:48
source share

It runs from most indentation to least indentation, and I believe in the bottom of the plan up. (So, if there are two indented sections, first the first step down the page is performed, and then when they meet with another execution, then the union connecting them is performed.)

The idea is that at each step there are 1 or 2 data sets that arrive and are processed by some rule. If there is only one data set, this operation is performed with this data set. (For example, scan the index to find out which rows you want, filter the data set or sort it.) If two, two data sets are two things that fall back further, and the rule that you see is attached to them. The meaning of most of the rules can be reasonably easily guessed (especially if you read a bunch of explanation plans earlier), but you can try to verify individual elements by looking at the documentation or (simply) simply by throwing this phrase into Google, as well as several keywords, for example EXPLAIN .

This is obviously not a complete explanation, but it provides enough context that you can usually figure out what you want. For example, consider this plan from an actual database:

 explain analyze select a.attributeid, a.attributevalue, b.productid from orderitemattribute a, orderitem b where a.orderid = b.orderid and a.attributeid = 'display-album' and b.productid = 'ModernBook'; ------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=125379.14..125775.12 rows=3311 width=29) (actual time=841.478..841.478 rows=0 loops=1) Merge Cond: (a.orderid = b.orderid) -> Sort (cost=109737.32..109881.89 rows=57828 width=23) (actual time=736.163..774.475 rows=16815 loops=1) Sort Key: a.orderid Sort Method: quicksort Memory: 1695kB -> Bitmap Heap Scan on orderitemattribute a (cost=1286.88..105163.27 rows=57828 width=23) (actual time=41.536..612.731 rows=16815 loops=1) Recheck Cond: ((attributeid)::text = 'display-album'::text) -> Bitmap Index Scan on (cost=0.00..1272.43 rows=57828 width=0) (actual time=25.033..25.033 rows=16815 loops=1) Index Cond: ((attributeid)::text = 'display-album'::text) -> Sort (cost=15641.81..15678.73 rows=14769 width=14) (actual time=14.471..16.898 rows=1109 loops=1) Sort Key: b.orderid Sort Method: quicksort Memory: 76kB -> Bitmap Heap Scan on orderitem b (cost=310.96..14619.03 rows=14769 width=14) (actual time=1.865..8.480 rows=1114 loops=1) Recheck Cond: ((productid)::text = 'ModernBook'::text) -> Bitmap Index Scan on id_orderitem_productid (cost=0.00..307.27 rows=14769 width=0) (actual time=1.431..1.431 rows=1114 loops=1) Index Cond: ((productid)::text = 'ModernBook'::text) Total runtime: 842.134 ms (17 rows) 

Try reading it for yourself and see if it makes sense.

What I'm reading is that the database first scans the id_orderitem_productid index, using this to find the desired rows from orderitem , then sorts this dataset using quicksort (the used sort will change if the data does not fit in RAM), then sets this to the side.

He then scans orditematt_attributeid_idx to find the desired rows from orderitemattribute , and then sorts this dataset using quick sort.

He then takes two data sets and combines them. (Merge merging is a kind of β€œlocking” operation when it moves two sorted datasets in parallel, emitting a merged row when they match.)

As I said, you work through the inside of the plan with the outside, from bottom to top.

+35
Sep 22 '08 at 22:59
source share

The part that I have always considered confusing is the startup costs and the total cost. I google it every time I forget about it, which brings me back here, which does not explain the difference, so I am writing this answer. This is what I learned from the Postgres EXPLAIN documentation , explained as I understand it.

Here is an example from the application that manages the forum:

 EXPLAIN SELECT * FROM post LIMIT 50; Limit (cost=0.00..3.39 rows=50 width=422) -> Seq Scan on post (cost=0.00..15629.12 rows=230412 width=422) 

Here's a graphical explanation from PgAdmin:

graphical explanation of the first request

(When you use PgAdmin, you can hover over the component to read the cost data.)

Cost is presented as a tuple, for example. the cost of LIMIT is cost=0.00..3.39 , and the cost of sequential post scanning is cost=0.00..15629.12 . The first number in the tuple is the launch cost, and the second is the total cost. Since I used EXPLAIN and not EXPLAIN ANALYZE , these costs are estimates, not actual measures.

  • Launch cost is a complex concept. It does not just represent the amount of time before starting this component. It represents the amount of time between when the component starts execution (reading in data) and when the component displays its first line .
  • The total cost is the total runtime of the component, starting from the moment of reading data until the completion of writing its output.

As a complication, each "parent" cost of a node includes the cost of its child nodes. In a textual representation, the tree is indented, for example. LIMIT is the parent node and Seq Scan is its child. In the PgAdmin view, arrows indicate from child to parent β€” the direction of the data stream β€” which may be inaccurate if you are familiar with graph theory.

The documentation says that the costs include all the child nodes, but note that the total cost of the parent 3.39 much less than the total cost of its child 15629.12 . The total cost is not included because a component, such as LIMIT , does not need to process all its input. See EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000 LIMIT 2; in the Postgres EXPLAIN documentation .

In the above example, the start time for both components is zero, since none of the components should do any processing before it starts writing rows: sequential scanning reads the first row of the table and emits it. LIMIT reads its first row and then emits it.

When does a component have to do a lot of processing before it can begin to print any lines? There are many possible reasons, but let's look at one striking example. Here's the same query from earlier, but now containing an ORDER BY :

 EXPLAIN SELECT * FROM post ORDER BY body LIMIT 50; Limit (cost=23283.24..23283.37 rows=50 width=422) -> Sort (cost=23283.24..23859.27 rows=230412 width=422) Sort Key: body -> Seq Scan on post (cost=0.00..15629.12 rows=230412 width=422) 

And graphically:

graphic explanation of the second request

Again, sequential scanning on post does not have a launch cost: it immediately prints lines. But sorting has a significant launch cost of 23283.24 , because it must sort the entire table before it even displays at least one row. The total cost of sorting 23859.27 slightly higher than the cost of starting, which reflects the fact that once the entire data set has been sorted, the sorted data can be emitted very quickly.

Please note that the start time of LIMIT 23283.24 exactly matches the start time of the sort. This is due not only to the fact that LIMIT has a long start time. In fact, it actually has a zero start time, but EXPLAIN collapses all child costs for each parent, so the LIMIT start time includes the start time of its children.

This set of costs can make it difficult to understand the cost of each individual component. For example, our LIMIT has zero start time, but this is not obvious at first glance. For this reason, several other people are associated with explain.depesz.com , a tool created by Hubert Lubaczewski (aka depesz) that helps to understand EXPLAIN by - among other things - subtract the cost of a child from material costs. He mentions some other difficulties in a short blog post about his tool.

+21
Feb 19 '16 at 17:10
source share

There is also an online help tool, Depesz , which indicates where the expensive parts of the analysis results are.

also has one, here are the same results that give me a clearer description of the problem.

+15
Sep 24 '08 at 10:47
source share

PgAdmin will show you a graphical representation of the explanation plan. Switching between them can help you understand what text representation means. However, if you just want to know what he is going to do, you can always use the graphical interface.

+13
Sep 23 '08 at 17:39
source share

The official PostgreSQL documentation contains an interesting, detailed explanation of how to understand the conclusion of the explanation.

+6
Mar 18 '13 at 0:09
source share

If you install pgadmin, there is an Explain button that, as well as text output, draws diagrams of what happens, showing filters, sorts, and subsets that I find useful to see what happens.

0
Sep 25 '08 at 13:15
source share



All Articles