I am a complete python noob, so please bear with me. I want python to look at the html page and replace instances of Microsoft Word objects with something compatible with UTF-8.
My question is: how do you do this in Python (I had it in Googled, but no clear answer yet)? I want to dip my finger in the water of Python, so I think that something is simple, as it is a good place to start. I seem to need:
- load text inserted from MS Word into a variable
- run some sort of replace function in the content
- bring him out
In PHP, I would do it like this:
$test = $_POST['pasted_from_Word']; //for example "Going Mobile"
function defangWord($string)
{
$search = array(
(chr(0xe2) . chr(0x80) . chr(0x98)),
(chr(0xe2) . chr(0x80) . chr(0x99)),
(chr(0xe2) . chr(0x80) . chr(0x9c)),
(chr(0xe2) . chr(0x80) . chr(0x9d)),
(chr(0xe2) . chr(0x80) . chr(0x93)),
(chr(0xe2) . chr(0x80) . chr(0x94)),
(chr(0x2d))
);
$replace = array(
"‘",
"’",
"“",
"”",
"–",
"—",
"–"
);
return str_replace($search, $replace, $string);
}
echo defangWord($test);
How do you do this in Python?
EDIT: , UTF-8 . , MS Word. , , . PHP, , , , . , , , (0xe2, 0x80 ..). HTML. , , , UTF-8, , MS Word, ?
EDIT2: , Python , . , , . UTF-8, , , - UTF-8, , , - UTF-8... Word . . Python...