i love it...">

How to use python to distinguish two html files

I want to use python to distinguish between two html files:

example:

html_1 = """ <p>i love it</p> """ html_2 = """ <h2>i love it </p> """ 

the diff file will look like this:

 diff_html = """ <del><p>i love it</p></dev><ins><h2>i love it</h2></ins> """ 

Is there such a python lib, help me do this?

+7
source share
6 answers

lxml might do something similar to what you want. From the docs:

 >>> from lxml.html.diff import htmldiff >>> doc1 = '''<p>Here is some text.</p>''' >>> doc2 = '''<p>Here is <b>a lot</b> of <i>text</i>.</p>''' >>> print htmldiff(doc1, doc2) <p>Here is <ins><b>a lot</b> of <i>text</i>.</ins> <del>some text.</del> </p> 

I don't know any other Python library for this particular task, but you may need to learn different words. They may come close to what you want.

One example is this , implemented in both PHP and Python (save it as diff.py , then import diff )

 >>> diff.htmlDiff(a,b) >>> '<del><p>i</del> <ins><h2>i</ins> love <del>it</p></del> <ins>it </p></ins>' 
+9
source

i run two python libraries which are useful:

but both of them use python difflib lib for diff text. but i want to use google diff.

+1
source

AFAIK, python has a built-in difflib that can do this.

0
source

Not quite what you have, but the difflib standard library has a simple htmldiff tool that will build html diff for you.

 import difflib html_1 = """ <p>i love it</p> """ html_2 = """ <h2>i love it </p> """ htmldiff = difflib.HtmlDiff() html_table = htmldiff.make_table([html_1], [html_2]) # each item is a list of lines 
0
source

You can use difflib.ndiff() to search and replace " - " / " + " with your desired HTML.

 import difflib html_1 = """ <p>i love it</p> """ html_2 = """ <h2>i love it </p> """ diff_html = "" theDiffs = difflib.ndiff(html_1.splitlines(), html_2.splitlines()) for eachDiff in theDiffs: if (eachDiff[0] == "-"): diff_html += "<del>%s</del>" % eachDiff[1:].strip() elif (eachDiff[0] == "+"): diff_html += "<ins>%s</ins>" % eachDiff[1:].strip() print diff_html 

Result:

 <del><p>i love it</p></del><ins><h2>i love it </p></ins> 
0
source

Checkout diff2HtmlCompare (full disclosure: I'm the author). If you are trying to simply visualize the differences, this may help you. If you are trying to make a difference and do something with it, then you can use difflib as others have suggested (the above script just wraps difflib and uses pigments to highlight the syntax). Doug Hellmann described pretty well how to use difflib, I would suggest checking out his tutorial .

0
source

All Articles