How can I request ranking for users in my database, but only consider the last record for each user?

Suppose I have a database table called "Scrape", possibly configured as:

UserID (int) UserName (varchar) Wins (int) Losses (int) ScrapeDate (datetime) 

I am trying to rank my users based on their win / loss ratio. However, every week I will parse new user data and make another entry in the Scrape table.

How can I request a list of users sorted by win / loss, but only taking into account the most recent record (ScrapeDate)?

Also, in your opinion, is it important that people get on the site, and the scratches are probably in the middle of completion?

For example, I could:

 1 - Bob - Wins: 320 - Losses: 110 - ScrapeDate: 7/8/09 1 - Bob - Wins: 360 - Losses: 122 - ScrapeDate: 7/17/09 2 - Frank - Wins: 115 - Losses: 20 - ScrapeDate: 7/8/09 

Here, this is a scratch that Bob has updated so far and is in the process of updating Frank but not yet inserted. How would you handle this situation?

So my question is:

  • How would you deal with requesting only the most recent scratch of each user to determine the ranking
  • Do you think the fact that the database may be in an update state (especially if the scraper can take up to 1 day), and not all users have completely updated? If so, how would you handle this?

Thank you, and thank you for your answers you gave me on my related question:

When you scrap a lot of statistics from a web page, how often should I embed the collected results in my database?

+4
source share
3 answers

This is what I call the "largest n-per-group" problem. He appears several times a week on StackOverflow.

I solve this type of problem using external connection technology:

 SELECT s1.*, s1.wins / s1.losses AS win_loss_ratio FROM Scrape s1 LEFT OUTER JOIN Scrape s2 ON (s1.username = s2.username AND s1.ScrapeDate < s2.ScrapeDate) WHERE s2.username IS NULL ORDER BY win_loss_ratio DESC; 

This returns only one row for each username β€” the row with the highest value in the ScrapeDate column. For what an external join is used, try matching s1 with some other string s2 with the same username and higher date. If there is no such row, the outer join returns NULL for all s2 columns, and then we know that s1 matches the row with the highest date for the given username.

This should also work when you are doing a partially completed cleanup.

This method is not necessarily as fast as other CTE and RANKING solutions. You should try both and see what works best for you. The reason I prefer my solution is because it works in any SQL style.

+3
source

The answer to part 1 of your question depends on the version of SQL server you are using. SQL 2005+ offers ranking functions that make this kind of query a little easier than in SQL 2000 and earlier. I will clarify this in more detail if you indicate which platform you are using.

I suspect that the clearest way to handle Part 2 is to display statistics for the last full scrambling exercise, otherwise you won’t show a constant rating (although if your data collection exercise takes 24 hours, the amount of latitude is already).

To simplify this, you can create a table to store metadata about each cleanup operation, indicating each of them with an identifier, start date and end date (at least) and display those records that refer to the last complete scratch. To make this easier, you can remove the "scrape date" from the data collection table and replace it with a foreign key that links each row of data with a row in the scrape table.

EDIT

The following code illustrates how to rank users by their most recent result, regardless of whether they are time-aligned:

 create table #scrape (userName varchar(20) ,wins int ,losses int ,scrapeDate datetime ) INSERT #scrape select 'Alice',100,200,'20090101' union select 'Alice',120,210,'20090201' union select 'Bob' ,200,200,'20090101' union select 'Clara',300,100,'20090101' union select 'Clara',300,210,'20090201' union select 'Dave' ,100,10 ,'20090101' ;with latestScrapeCTE AS ( SELECT * ,ROW_NUMBER() OVER (PARTITION BY userName ORDER BY scrapeDate desc ) AS rn ,wins + losses AS totalPlayed ,wins - losses as winDiff from #scrape ) SELECT userName ,wins ,losses ,scrapeDate ,winDiff ,totalPlayed ,RANK() OVER (ORDER BY winDiff desc ,totalPlayed desc ) as rankPos FROM latestScrapeCTE WHERE rn = 1 ORDER BY rankPos 

EDIT 2

An illustration of using a metadata table to select the last complete scratch:

 create table #scrape_run (runID int identity ,startDate datetime ,completedDate datetime ) create table #scrape (userName varchar(20) ,wins int ,losses int ,scrapeRunID int ) INSERT #scrape_run select '20090101', '20090102' union select '20090201', null --null completion date indicates that the scrape is not complete INSERT #scrape select 'Alice',100,200,1 union select 'Alice',120,210,2 union select 'Bob' ,200,200,1 union select 'Clara',300,100,1 union select 'Clara',300,210,2 union select 'Dave' ,100,10 ,1 ;with latestScrapeCTE AS ( SELECT TOP 1 runID ,startDate FROM #scrape_run WHERE completedDate IS NOT NULL ) SELECT userName ,wins ,losses ,startDate AS scrapeDate ,wins - losses AS winDiff ,wins + losses AS totalPlayed ,RANK() OVER (ORDER BY (wins - losses) desc ,(wins + losses) desc ) as rankPos FROM #scrape JOIN latestScrapeCTE ON runID = scrapeRunID ORDER BY rankPos 
0
source

Try something like:

  • Select the user ID and maximum date of the last record for each user.
  • Select and order entries to get a ranking based on the above query results.

This should work, however, depending on the size of your database.

 DECLARE @last_entries TABLE(id int, dte datetime) -- insert date (dte) of last entry for each user (id) INSERT INTO @last_entries (id, dte) SELECT UserID, MAX(ScrapeDate) FROM Scrape WITH (NOLOCK) GROUP BY UserID -- select ranking SELECT -- optionally you can use RANK OVER() function to get rank value UserName, Wins, Losses FROM @last_entries JOIN Scraps WITH (NOLOCK) ON UserID = id AND ScrapeDate = dte ORDER BY Winds, Losses 

I am not testing this code, so it could not compile it on first run.

0
source

All Articles