The decision really depends if you are cleaning a specific site or trying to create a program that will work on any site.
You can see which areas change frequently, doing something like this:
diff <(curl http://stackoverflow.com/questions/) <(sleep 15; curl http://stackoverflow.com/questions/)
If you are worried about only one site, you can create several sed expressions to filter out material, such as timestamps. You can repeat until the difference for small fields is shown.
The general problem is much more complicated, and I would suggest comparing the total number of words on the page for starters.
brianegge
source share