the first example on the home page does something very similar, but think about it:
HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); // would need doc.LoadHtml(htmlSource) if it is not a file foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"]) { string href = link["href"].Value; // store href somewhere }
So you can imagine that for img @src just replace each a with img and href with src . You can even simplify:
foreach(HtmlNode node in doc.DocumentElement .SelectNodes("//a/@href | //img/@src") { list.Add(node.Value); }
For relative URL handling, look at the Uri class.
Marc gravell
source share