I am writing a script to export my links and their headers from chrome to html.
Chrome bookmarks stored as json in utf encoding
Some names are in Russian, so they are stored like this:
"name": "\ u0425 \ u0430 \ u0431 \ u0440 \ ..."
import codecs
f = codecs.open("chrome.json","r", "utf-8")
data = f.readlines()
urls = []
names = []
ind = 0
for i in data:
if i.find('"url":') != -1:
urls.append(i.split('"')[3])
names.append(data[ind-2].split('"')[3])
ind += 1
fw = codecs.open("chrome.html","w","utf-8")
fw.write("<html><body>\n")
for n in names:
fw.write(n + '<br>')
fw.write("</body></html>")
Now, in chrome.html, I got the ones that display as \ u0425 \ u0430 \ u0431 ...
How can I return them to Russian?
using python 2.5
** Edit: Solved! **
s = '\u041f\u0440\u0438\u0432\u0435\u0442 world!'
type(s)
<type 'str'>
print s.decode('raw-unicode-escape').encode('utf-8')
world!
What I need to convert str from \ u041f ... to unicode .
f = open("chrome.json", "r")
data = f.readlines()
f.close()
urls = []
names = []
ind = 0
for i in data:
if i.find('"url":') != -1:
urls.append(i.split('"')[3])
names.append(data[ind-2].split('"')[3])
ind += 1
fw = open("chrome.html","w")
fw.write("<html><body>\n")
for n in names:
fw.write(n.decode('raw-unicode-escape').encode('utf-8') + '<br>')
fw.write("</body></html>")
source
share