I am dealing with a Postgres table (called "life") that contains column entries for time_stamp, usr_id, transaction_id and lives_remaining. I need a request that will give me the latest life information left for each usr_id
- There are several users (different usr_id)
- time_stamp is not a unique identifier: sometimes user events (one row in the table) will encounter the same time_stamp.
- trans_id is unique only for very small time ranges: over time, it repeats
- Other leafs (for a given user) may increase and decrease over time
Example:
time_stamp | lives_remaining | usr_id | trans_id
-----------------------------------------
07:00 | 1 | 1 | one
09:00 | 4 | 2 | 2
10:00 | 2 | 3 | 3
10:00 | 1 | 2 | four
11:00 | 4 | 1 | 5
11:00 | 3 | 1 | 6
13:00 | 3 | 3 | one
Since I will need to access the other columns of the row with the latest data for each given usr_id, I need a query that gives the result as follows:
time_stamp | lives_remaining | usr_id | trans_id
-----------------------------------------
11:00 | 3 | 1 | 6
10:00 | 1 | 2 | four
13:00 | 3 | 3 | one
As already mentioned, each usr_id can gain or lose lives, and sometimes these events with a time delay occur so close to each other that they have the same time stamp! Therefore, this request will not work:
SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM (SELECT usr_id, max(time_stamp) AS max_timestamp FROM lives GROUP BY usr_id ORDER BY usr_id) a JOIN lives b ON a.max_timestamp = b.time_stamp
Instead, I need to use both time_stamp (first) and trans_id (second) to determine the correct line. Then I also need to pass this information from the subquery to the main query, which will provide data for the other columns of the corresponding rows. This is a hacked request that I started working:
SELECT b.time_stamp,b.lives_remaining,b.usr_id,b.trans_id FROM (SELECT usr_id, max(time_stamp || '*' || trans_id) AS max_timestamp_transid FROM lives GROUP BY usr_id ORDER BY usr_id) a JOIN lives b ON a.max_timestamp_transid = b.time_stamp || '*' || b.trans_id ORDER BY b.usr_id
Okay, thatβs how it works, but I donβt like it. It requires a query in the query, self-join, and it seems to me that it can be much simpler by capturing the line, which, as set by MAX, has the largest timestamp and trans_id. The lives table has tens of millions of rows for parsing, so I would like this query to be as fast and efficient as possible. I am new to RDBM and Postgres, so I know that I need to use the appropriate indexes efficiently. I lost a little how to optimize.
I found a similar discussion here . Can I execute some type of Postgres equivalent for an Oracle analytic function?
Any advice on accessing the relevant column information used by the aggregate function (e.g. MAX), creating indexes, and creating better queries will be greatly appreciated!
PS You can use the following to create an example for an example:
create TABLE lives (time_stamp timestamp, lives_remaining integer, usr_id integer, trans_id integer); insert into lives values ('2000-01-01 07:00', 1, 1, 1); insert into lives values ('2000-01-01 09:00', 4, 2, 2); insert into lives values ('2000-01-01 10:00', 2, 3, 3); insert into lives values ('2000-01-01 10:00', 1, 2, 4); insert into lives values ('2000-01-01 11:00', 4, 1, 5); insert into lives values ('2000-01-01 11:00', 3, 1, 6); insert into lives values ('2000-01-01 13:00', 3, 3, 1);