Recursive pattern search in string
I am using C #. I have a line
<li> <a href="abc">P1</a> <ul> <li><a href = "bcd">P11</a></li> <li><a href = "bcd">P12</a></li> <li><a href = "bcd">P13</a></li> <li><a href = "bcd">P14</a></li> </ul> </li> <li> <a href="abc">P2</a> <ul> <li><a href = "bcd">P21</a></li> <li><a href = "bcd">P22</a></li> <li><a href = "bcd">P23</a></li> </ul> </li> <li> <a href="abc">P3</a> <ul> <li><a href = "bcd">P31</a></li> <li><a href = "bcd">P32</a></li> <li><a href = "bcd">P33</a></li> <li><a href = "bcd">P34</a></li> </ul> </li> <li> <a href="abc">P4</a> <ul> <li><a href = "bcd">P41</a></li> <li><a href = "bcd">P42</a></li> </ul> </li> My goal is to populate the following list from the line above.
List<class1> class1 has two properties:
string parent; List<string> children; He should fill in P1 in the parent and P11, P12, P13, P14 in children and make a list of them.
Any suggestion would be helpful.
Edit
Example
public List<class1> getElements() { List<class1> temp = new List<class1>(); foreach(// <a> element in string) { //in the recursive loop List<string> str = new List<string>(); str.add("P11"); str.add("P12"); str.add("P13"); str.add("P14"); class1 obj = new class1("P1",str); temp.add(obj); } return temp; } the values ββhere are hardcoded, but they will be dynamic.
If you cannot use a third-party tool, such as the recommended Html Agility Pack , you can use the Webbrowser and HtmlDocument class to parse the HTML:
WebBrowser wbc = new WebBrowser(); wbc.DocumentText = "foo"; // necessary to create the document HtmlDocument doc = wbc.Document.OpenNew(true); doc.Write((string)html); // insert your html-string here List<class1> elements = wbc.Document.GetElementsByTagName("li").Cast<HtmlElement>() .Where(li => li.Children.Count == 2) .Select(outerLi => new class1 { parent = outerLi.FirstChild.InnerText, children = outerLi.Children.Cast<HtmlElement>() .Last().Children.Cast<HtmlElement>() .Select(innerLi => innerLi.FirstChild.InnerText).ToList() }).ToList(); Here is the result in the debugger window:

What you want is a recursive descent parser. All other library guidelines generally assume that you are using a recursive parser for HTML or XML that has been written by others.
The basic structure of a recursive descent analyzer is to perform a linear search for a list of tokens (in your case, a string) and, after colliding with a token that restricts the entity, call the parser again to process the token sublist name (substring).
You can use Google for the term "recursive descent analyzer" and find many useful results. Even the Wikipedia article is pretty good in this case and includes an example of a recursive descent parser in C.
You can also use XmlDocument:
XmlDocument doc = new XmlDocument(); doc.LoadXml(yourInputString); XmlNodeList colNodes = xmlSource.SelectNodes("li"); foreach (XmlNode node in colNodes) { // ... your logic here // for example // string parentName = node.SelectSingleNode("a").InnerText; // string parentHref = node.SelectSingleNode("a").Attribures["href"].Value; // XmlNodeList children = // node.SelectSingleNode("ul").SelectNodes("li"); // foreach (XmlNode child in children) // { // ...... // } }