OK, I'm losing what to call this question. I have some HTML files, probably written by Lord Lucifer himself, which I need to parse. It consists of many segments like this, among other html tags
<p>HeadingNumber</p> <p style="text-indent:number;margin-top:neg_num ">Heading Text</p> <p>Body</p>
Please note that the header number and text are in separate p-tags, horizontally aligned in css. css can be anything that the Lucifer wants, a mixture of indentation, padding, margins and positions.
However, this line is the only object in my business model and should be stored as such. So, how to determine if two p elements are visually on the same line and process them accordingly. I find the HTML files are well-formed if that helps.
html c # parsing
Midhat
source share