how can I find the whole range with the class 'blue' which contains text in the format:
04/18/13 7:29pm
which could be like this:
04/18/13 7:29pm
or
Posted on 04/18/13 7:29pm
in terms of building logic for this, this is what I got so far:
new_content = original_content.find_all('span', {'class' : 'blue'}) # using beautiful soup find_all pattern = re.compile('<span class=\"blue\">[data in the format 04/18/13 7:29pm]</span>') # using re for _ in new_content: result = re.findall(pattern, _) print result
I was referring to https://stackoverflow.com/a/2126149/ and to https://stackoverflow.com/questions/524432/... to try to figure out a way to do this, but above is all I got so far.
change
to clarify the scenario, there is a range with:
<span class="blue">here is a lot of text that i don't need</span>
and
<span class="blue">this is the span i need because it contains 04/18/13 7:29pm</span>
and note that I need 04/18/13 7:29pm not the rest of the content.
change 2:
I also tried:
pattern = re.compile('<span class="blue">.*?(\d\d/\d\d/\d\d \d\d?:\d\d\w\w)</span>') for _ in new_content: result = re.findall(pattern, _) print result
and got the error:
'TypeError: expected string or buffer'
user1063287
source share