To get the latest entries, use the standard download with the most recent descending date starting from the last entries. You will get a continuation token in the XML result, looking something like this:
<gr:continuation>CArhxxjRmNsC</gr:continuation>`
Scan the results, pulling out something new for you. You should find that either all the results are new, or everything to the point is new, and after that you already know everything.
In the latter case, everything is ready, but in the first you need to find new material older than what you have already received. Do this using the continuation to get results starting from the moment after the last result in the set that you just received, passing it in the GET request as parameter c , for example:
http://www.google.com/reader/atom/user/-/state/com.google/reading-list?c=CArhxxjRmNsC
Continue this path until you get everything.
The n parameter, which is a count of the number of elements to extract, works well with this, and you can change it as you go. If the scan frequency is set by the user and, therefore, can be very frequent or very rare, you can use the adaptive algorithm to reduce network traffic and load the process. First, request a small number of recent entries, say five (add n=5 to the URL of your GET request). If everything is new, in the next query where you use the continuation, ask for a larger number, say 20. If they are still new, either there are a lot of updates in the feed, or some time, so continue in groups of 100 or something something else.
However, correct me if I am mistaken here, you also want to know, after you downloaded the item, whether its state has changed from “unread” to “read” due to the person reading it using the Google Reader Interface.
One approach to this:
- Update google status of any items that have been read locally.
- Check and save the unread feed counter. (You want to do this until the next step so that you ensure that no new items are received between loading the latest items and the time taken to check the number of samples.)
- Download the latest data.
- Calculate the number of views and compare them with Google. If your feed has a higher reading rate than you calculated, you know that something was read on google.
- If something was read on google, start downloading the read items and comparing them with your database of unread items. You will find some elements that, according to Google, read that your database requests are unread; update them. Continue to do this until you find the number of these elements equal to the difference between the number of views and google, or until the download is unreasonable.
- If you have not found all the items you read, c'est la vie; write down the remaining number as an “unreasonable unread” total, which also needs to be included in the next local number calculation, which, in your opinion, is unread.
If a user subscribes to many different blogs, he probably also calls them broadly, so you can do it all based on each tag, and not on the entire channel, which should help keep the amount of data down, since you won’t have to do what Any transfer for shortcuts where the user has not read anything new in Google Reader.
This whole scheme can be applied to other statuses, such as withdrawn or non-artistic.
Now, as you say, this
... would mean that I need to save my own read / unread state on the client and that the records are already marked as read when the user enters the online version of Google Reader. This does not work for me.
Right. Without preserving the local read / unread state (since you still keep a database of all the elements), and marking the elements read in google (which are supported by the API) is very difficult, so why doesn’t this work for you?
However, there is another problem: the user may mark something read as unread on google. This throws some key into the system. My suggestion there, if you really want to try to take care of this, is to assume that the user as a whole will only touch on more recent things and download the last few hundred items every time, checking the status on all of them. (This is not so bad: downloading 100 items takes me anywhere from 0.3 for 300 KB to 2.5 s for 2.5 MB, although with a very fast broadband connection.)
Again, if the user has a large number of subscriptions, he probably also received a fairly large number of shortcuts, so this is done on the basis of each label, and will speed up the process. In fact, I would suggest that you not only check based on each label, but also distribute checks by checking one label every minute, and not all every twenty minutes. You can also do this “big check” for status changes on older items less often than you do a “new stuff” check, maybe every few hours if you want to reduce bandwidth.
This is a small frequency band, mainly because you need to download the full article from Google to check the status. Unfortunately, I do not see anything like this in the API documents that are available to us. My only real advice is to minimize status checks on not new items.