I have a bunch of HTML that I am parsing with BeautifulSoup and everything is going fine, with the exception of one minor error. I want to save the output in a single line string, as my current output:
<li><span class="plaincharacterwrap break"> Zazzafooky but one two three! </span></li> <li><span class="plaincharacterwrap break"> Zazzafooky2 </span></li> <li><span class="plaincharacterwrap break"> Zazzafooky3 </span></li>
Ideally, I would like
<li><span class="plaincharacterwrap break">Zazzafooky but one two three!</span></li><li><span class="plaincharacterwrap break">Zazzafooky2</span></li>
There are a lot of redundant spaces that I would like to get rid of, but it is not necessarily replaced by strip() , and I cannot explicitly remove all spaces because I need to save the text. How can I do it? It seems like a fairly common problem that regex would be redundant, but is this the only way?
I don't have <pre> tags, so I can be a little stronger.
Thanks again!
python regex html-parsing beautifulsoup
Rio
source share