How to concatenate two html files with BeautifulSoup?

I need to combine the bodies of two html files into one html file with a small amount of arbitrary html as a separator between them. I have code that was used for this, but stopped working when I upgraded it from Xubuntu 11.10 (or was 11.04?) To 12.10, possibly due to the BeautifulSoup update (I am currently using 3.2.1; don't know which version I had before) or updating vim (I use vim to automatically generate html files from text files). This is a stripped down version of the code:

from BeautifulSoup import BeautifulSoup
soup_original_1 = BeautifulSoup(''.join(open('test1.html')))
soup_original_2 = BeautifulSoup(''.join(open('test2.html')))
contents_1 = soup_original_1.body.renderContents()
contents_2 = soup_original_2.body.renderContents()
contents_both = contents_1 + "\n<b>SEPARATOR\n</b>" + contents_2
soup_new = BeautifulSoup(''.join(open('test1.html')))
while len(soup_new.body.contents):
    soup_new.body.contents[0].extract()
soup_new.body.insert(0, contents_both)                       

The body of the two input files used for the test case is very simple: contents_1- \n<pre>\nFile 1\n</pre>\n'and contents_2- '\n<pre>\nFile 2\n</pre>\n'.

, soup_new.body.renderContents() , < &lt; .. - '\n<pre>\nFile 1\n</pre>\n\n<b>SEPARATOR\n</b>\n<pre>\nFile 2\n</pre>\n', , ; '\n&lt;pre&gt;\nFile 1\n&lt;/pre&gt;\n\n&lt;b&gt;SEPARATOR\n&lt;/b&gt;\n&lt;pre&gt;\nFile 2\n&lt;/pre&gt;\n', .

BeautifulSoup < &lt; .. html ? -? ( BeautifulSoup html, , .)

html vim ( , , , , ). test1.html , test2.html , .

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>~/programs/lab_notebook_and_printing/concatenate-html_problem_2013/test1.txt.html</title>
<meta name="Generator" content="Vim/7.3" />
<meta name="plugin-version" content="vim7.3_v10" />
<meta name="syntax" content="none" />
<meta name="settings" content="ignore_folding,use_css,pre_wrap,expand_tabs,ignore_conceal" />
<style type="text/css">
pre { white-space: pre-wrap; font-family: monospace; color: #000000; background-color: #ffffff; white-space: pre-wrap; word-wrap: break-word }
body { font-family: monospace; color: #000000; background-color: #ffffff; font-size: 0.875em }
</style>
</head>
<body>
<pre>
File 1
</pre>
</body>
</html>
+4
2

HTML , HTML , .

- . test2 test1, ? :

for element in soup_original_2.body:
    soup_original_1.body.append(element)

, :

b = soup.new_tag('b')
b.append('SEPARATOR')
soup.original_1.body.append(b)
for element in soup_original_2.body:
    soup_original_1.body.append(element)

.

, .

+9

abarnert, .

Martijn Pieters ♦ .

BeautifulSoup 4.4 ( 15) :

import copy

document2.body.append(copy.copy(element))
+1

All Articles