Suppose I have a database table called "Scrape", possibly configured as:
UserID (int) UserName (varchar) Wins (int) Losses (int) ScrapeDate (datetime)
I am trying to rank my users based on their win / loss ratio. However, every week I will parse new user data and make another entry in the Scrape table.
How can I request a list of users sorted by win / loss, but only taking into account the most recent record (ScrapeDate)?
Also, in your opinion, is it important that people get on the site, and the scratches are probably in the middle of completion?
For example, I could:
1 - Bob - Wins: 320 - Losses: 110 - ScrapeDate: 7/8/09 1 - Bob - Wins: 360 - Losses: 122 - ScrapeDate: 7/17/09 2 - Frank - Wins: 115 - Losses: 20 - ScrapeDate: 7/8/09
Here, this is a scratch that Bob has updated so far and is in the process of updating Frank but not yet inserted. How would you handle this situation?
So my question is:
- How would you deal with requesting only the most recent scratch of each user to determine the ranking
- Do you think the fact that the database may be in an update state (especially if the scraper can take up to 1 day), and not all users have completely updated? If so, how would you handle this?
Thank you, and thank you for your answers you gave me on my related question:
When you scrap a lot of statistics from a web page, how often should I embed the collected results in my database?
source share