I use UPDATE in a table containing 250 million rows with 3 indexes; this UPDATE uses another table containing 30 million rows. It works for about 36 hours. I wonder if their way of knowing how close this is being done, if he plans to spend a million days doing his job, I will kill him; but if he needs only one day or two, I will let him work. Here is the command request:
UPDATE pagelinks SET pl_to = page_id FROM page WHERE (pl_namespace, pl_title) = (page_namespace, page_title) AND page_is_redirect = 0 ;
EXPLAIN is not a problem here, and I only mention a large table with several indexes to justify somewhat how long it will take to UPDATE. But here is EXPLAIN anyway:
Merge Join (cost=127710692.21..135714045.43 rows=452882848 width=57) Merge Cond: (("outer".page_namespace = "inner".pl_namespace) AND ("outer"."?column4?" = "inner"."?column5?")) -> Sort (cost=3193335.39..3219544.38 rows=10483593 width=41) Sort Key: page.page_namespace, (page.page_title)::text -> Seq Scan on page (cost=0.00..439678.01 rows=10483593 width=41) Filter: (page_is_redirect = 0::numeric) -> Sort (cost=124517356.82..125285665.74 rows=307323566 width=46) Sort Key: pagelinks.pl_namespace, (pagelinks.pl_title)::text" -> Seq Scan on pagelinks (cost=0.00..6169460.66 rows=307323566 width=46)
Now I also sent a parallel query command to DROP one of the pagelinks indexes; of course, he is waiting for UPDATE to finish (but I still wanted to try it!). Consequently, I cannot SELECT any of the pagelinks for fear of data corruption (unless you think it would be safe to kill the postmaster DROP INDEX process?).
So I wonder if there is a table that will track the number of dead tuples or something else. It would be nice to know how fast or how UPDATE is at the end of its task.
thanks (PostgreSQL is not as smart as I thought, it needs heuristics)
sql sql-update postgresql
Nicholas leonard
source share