Python XPath syntax tag with apostrophe

Question

Python XPath syntax tag with apostrophe

I am new to XPath. I am trying to parse a page using XPath. I need to get the information from the tag, but the escaped apostrophe in the title twists everything.

For analysis, I use Grab .

tag

from source:

<img src='somelink' border='0' alt='commission:Alfred\ misadventures' title='commission:Alfred\ misadventures'>

Actual XPath:

 g.xpath('.//tr/td/a[3]/img').get('title')

Returns

 commission:Alfred\\

Is there any way to fix this?

thanks

+7

python parsing xpath apostrophe

Stanislav Golovanov Dec 10 '11 at 20:30

source share

2 answers

Wayne burkett · Answer 1 · 2011-12-10T21:14:03+0000

Trash, trash. Your input is not correct as it improperly escapes the single quote character. Many programming languages (including Python) use the backslash character to exclude quotation marks in string literals. In XML, no. You must either 1) surround the attribute value with double quotation marks; or 2) use ' to include a single quote .

From the XML specification :

In order for attribute values to contain both single and double quotes, an apostrophe or a single quote character (') can be represented as " ' " and a double quote character (") as" " "

Dimitre novatchev · Answer 2 · 2011-12-10T23:56:49+0000

Since the provided "XML" is not a well-formed document due to nested apostrophes, the XPath expression cannot be evaluated on it .

The provided incorrectly formed text can be corrected for:

 <img src="somelink" border="0" alt="commission:Alfred misadventures" title="commission:Alfred misadventures"/>

If there is a strange requirement not to use quotation marks, then one correct conversion :

 <img src='somelink' border='0' alt='commission:Alfred&apos;s misadventures' title='commission:Alfred&apos;s misadventures'/>

If you are provided with incorrect input, in a language such as C #, you can try to convert it to its correct instance using :

 string correctXml = input.replace("\\'s", "&apos;s")

Perhaps there is a similar way to do the same in Python.

Python XPath syntax tag with apostrophe

More articles: