at first, I would recommend saving at least one profile where you store this information separately; You want to know what format people consume articles so that you know what content to create.
Then you need to create an advanced profile filter. Here is a google article about them.
And here is the GPF thread on the same issue.
To adapt it to your needs, your first field might look something like this:
(\/site\/[0-9]{1,2}\/[0-9]{1,2}\/[0-9]+)($)?|\.html|.\pdf
... ( , 1 2 , , ). ($)?|\.html|.\pdf : " , OR.html .pdf"; .* .
$A1, , ; , , A
, :)
, , , , , .