EDIT:
Since you guys suggested creating separate tables for player / tournament names and replacing rows with foreign keys, I did the following:
SELECT DISTINCT tournament INTO tournaments FROM chess_data2 ALTER TABLE tournaments ADD COLUMN id SERIAL PRIMARY KEY
I repeated that for namew and nameb, and then replaced the strings with foreign keys. That's where it got tricky - I can't do it at the "legal" time.
I tried two approaches: 1) Delete existing indexes
1) Create individual indexes for namew, nameb and tournament separately
1) Run the query by inserting the necessary data into a new table:
SELECT date, whiterank, blackrank, t_round, result, (SELECT p.id FROM players p WHERE c_d2.namew = p.name) AS whitep, (SELECT p2.id FROM players p2 WHERE c_d2.nameb = p2.name) AS blackp, (SELECT t.id FROM tournaments t WHERE t_d2.tournament = t.t_name) AS tournament INTO final_chess from chess_data2 c_d2
Unfortunately, it was very slow, so I returned to the user Boris Schegolev. In a comment, he suggested creating a new column in the existing chess_data2 table and updating it. So I did this:
ALTER TABLE chess_data2 ADD COLUMN name_id INTEGER UPDATE chess_data2 cd2 SET namew_id = (SELECT id FROM players WHERE name = cd2.namew)"
I started these queries half an hour ago, the first of them was instantaneous, and the second - forever.
What should I do now?
INITIAL QUESTION:
Database Schema:
date DATE
nemew TEXT
nameb TEXT whiterank INTEGER
blackrank INTEGER
tournament TEXT
t_round INTEGER
REAL result
id BIGINT
chess_data2_pkey (identifier)
black_index (name, tournament, date)
chess_data2_pkey (id) UNIQUE
w_b_t_d_index (namew, nameb, tournament, date)
white_index (namew, tournament, date)
Problem:
The following statement is very good (~ 60-70 seconds in a database with 3 million records):
Meanwhile, the next choice (which is EXACTLY the same as the sentences) takes forever for calculation.
I tried a different approach (with a sum), and it also did not improve:
Any idea what is going on here and how to fix it? I am using python 3.5 and psycopg2 in PyCharm to run these requests. I will be very happy to provide any additional information, since for me this is a very important project.
EXPLAIN ANALYZE (used for the last request):
Seq Scan on chess_data2 t1 (cost=0.00..49571932.96 rows=2879185 width=86) (actual time=0.061..81756.896 rows=2879185 loops=1) Planning time: 0.161 ms Execution time: 81883.716 ms SubPlan 2 SubPlan 1 -> Aggregate (cost=8.58..8.59 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=2879185) -> Aggregate (cost=8.58..8.59 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=2879185) -> Index Only Scan using white_index on chess_data2 t2 (cost=0.56..8.58 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=2879185) -> Index Only Scan using black_index on chess_data2 t2_1 (cost=0.56..8.58 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=2879185) Rows Removed by Filter: 1 Rows Removed by Filter: 1 Index Cond: ((namew = t1.namew) AND (tournament = t1.tournament) AND (date < t1.date)) Index Cond: ((nameb = t1.namew) AND (tournament = t1.tournament) AND (date < t1.date)) Heap Fetches: 6009767 Heap Fetches: 5303160 Filter: (t1.date < (date + 90)) Filter: (t1.date < (date + 90))