Almost real-time RSS feed updates

I have a system that retrieves several hundred RSS feeds. They are currently on a 10 minute refresh cycle, but I would rather do it faster. What is the strategy for getting RSS sources in real time / time intervals?

Some solutions I came across:

  • make a sample in 1 minute; if there are no changes, repeat the selection at 2, then 4, then 8, etc.
  • find the average interval / variance between RSS feed updates and put them in a bucket (this file is updated every 3 minutes, so check every minute, this is updated every week, so check every day, etc.)
+4
source share
3 answers

I used something like the first option. Start with the default time before receiving the feed. When new items are discovered, the waiting period increases by 10%, otherwise it increases by 10%. Perform this adaptation with each update, and the system itself is configured.

You can use different percentages, for example. Decrease time faster to better respond to changing refresh rates.

Include minimum and maximum time intervals to continue waiting within a predefined range.

This is not perfect, but for me it was enough.

0
source

It is impossible to do pulling quickly and efficiently. You will test more often or more (and be less effective), or be more effective by interviewing less often.

The only way to achieve near real time is to poll at the right time :)

Fortunately, some publics (more and more!) Use PubSubHubbub to update their channels and let subscribers know. Other services such as Superfeedr (I'm working on Superfeedr) use different methods to find out when is the best time to receive a feed (based on historical updates, updates in related feeds ... etc.).

+2
source

Although this is only part of the solution, you can also (if the feed is submitted via HTTP), check the Cache-Control and Expires headers in the RSS feed for tips on how often you should receive the feed.

0
source

All Articles