I have a definition string in which HTML can be displayed, and an array of words. I try to find these words in definition and return the start and end positions. For example, I could find "Hello" in:
definition = "<strong>Hel</strong>lo World!"
Getting rid of HTML can be done using sanitize from ActionView and HTMLEntities , but this changes the "Hello" index on the line, so:
sanitized_definition.index("Hello")
will return 0 . I need start point 8 and end point 21 . I was thinking of matching the entire row with my own indices, e.g.
{"1" => '<', "2" => 's', "3" => 't', .. , "9" => 'H' ...}
so 1 maps to the first character, 2 to the second and so on, but I'm not sure what this does, and it seems too complicated. Does anyone have any ideas how to do this?
EDIT:
A good point in the comments is that it doesn’t make sense that I want to include </strong> but not <strong> at the beginning, partly because I didn’t understand what to do with this edge case. For the purposes of this question, a better example might be sort of
definition = "Probati<strong>onary Peri</strong>od." search_text = 'Probationary Period'
Also, having thought about this a little more, I think that in my particular case, the only html structure I need to worry about is .