How to get href tag attributes on this line?

This line has the number li tag. I want to get the href attribute of tags, such as:

http://bipardeh94.blogfa.com" target="_blank http://avaejam.blogfa.com" target="_blank 

and ... I want this to be done with C #. How to do it? I use this code, but it is not complete.

 int indexStartUl = _codeHtml.IndexOf("<ul"); int indexEndUl = _codeHtml.IndexOf("</ul>"); _codeHtml = _codeHtml.Substring(indexStartUl, indexEndUl); 

Please, help.

  <ul class="ull"> <li><a href="http://bipardeh94.blogfa.com" target="_blank">باغ بلور</a><span class="ur">bipardeh94.blogfa.com</span><span class="ds">فرهنگی-خبری-علمی</span></li> <li><a href="http://avaejam.blogfa.com" target="_blank">هزار نکته </a><span class="ur">avaejam.blogfa.com</span><span class="ds"> يك نكته از هزار نكته باشد تا بعد </span></li> <li><a href="http://prkangavar.blogfa.com" target="_blank">روابط عمومی دانشگاه آزاداسلامی کنگاور</a><span class="ur">prkangavar.blogfa.com</span><span class="ds">اخبار دانشگاه</span></li> <li><a href="http://bordekhoun.blogfa.com" target="_blank">وبلاگ اطلاع رسانی بردخون</a><span class="ur">bordekhoun.blogfa.com</span><span class="ds">اخباروگزارشات وتحلیل ها درباره بردخون</span></li> <li><a href="http://mahinvare.blogfa.com" target="_blank">تدوری های نوین</a><span class="ur">mahinvare.blogfa.com</span><span class="ds">نظریه های علوم انسانی باید متحول شود</span></li> <li><a href="http://zanjanuniversity.blogfa.com" target="_blank">دانشگاه زنجان</a><span class="ur">zanjanuniversity.blogfa.com</span><span class="ds">اخبار دانشگاهیان زنجان و دانشگاه آزاد زنجان و سیستم ثبت نام شهردای زنجان </span> </li> </ul> 
+6
source share
4 answers

You can use Selenium WebDriver functionality:

 IList<IWebElement> lis = driver.FindElements(By.CssSelector(".ull > li")); foreach (IWebElement li in lis) { string href = li.GetAttribute("href"); } 

You will find all the WebElements tags with the li tag, which are children of the WebElement with the WebElement class, and WebElement over the list and accept the href attribute.

+4
source

You can use the Html Agility Pack

Sample Html Flexibility Packages:

  HtmlDocument doc = new HtmlDocument(); doc.Load("file.htm"); foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"]) { HtmlAttribute att = link["href"]; att.Value = FixLink(att); } doc.Save("file.htm"); 

References:

How to use HTML Flexibility Pack

http://www.mikesdotnetting.com/article/273/using-the-htmlagilitypack-to-parse-html-in-asp-net http://www.codeproject.com/Articles/691119/Html-Agility-Pack -Massive-information-extraction-f

I hope this information helps

+3
source

For better understanding

substring (a, b)

  • a: where do you want to start your substring
  • b: what will be the length of the substring

In your ex you take:

a as starting ul index

b as the ending index ul // Error b will be the length from the beginning of the line to the end of ul!

you need to do the following:

 int c = b - a // (will give you the inner text length) _codeHtml = _codeHtml.Substring(a,c); 
+1
source

without any external library or tools, use the following line:

 var hrefs = html.Split(new[] { "href='" }, StringSplitOptions.RemoveEmptyEntries).Where(o => o.StartsWith("http")).Select(o => o.Substring(0, o.IndexOf("'"))); 

which will give you an array with all hrefs, for example, the following result:

 http://bipardeh94.blogfa.com http://avaejam.blogfa.com http://prkangavar.blogfa.com http://bordekhoun.blogfa.com http://mahinvare.blogfa.com http://zanjanuniversity.blogfa.com 

A complete example is available at: this.net violin

0
source

All Articles