Convert from ANSI to UTF-8


I have about 600,000 files encoded in ANSI , and I want to convert them to UTF-8 . I can do this individually in NOTEPAD++ , but I cannot do this for 600,000 files. Can I do this in R or Python ?

I found this link, but Python script is not running: notepad ++ converts ansi-encoded file to utf-8

+5
source share
2 answers

Why don't you read the file and write it as UTF-8? You can do it in Python.

 #to support encodings import codecs #read input file with codecs.open(path, 'r', encoding = 'utf8') as file: lines = file.read() #write output file with codecs.open(path, 'w', encoding = 'utf8') as file: file.write(lines) 
+6
source

I appreciate that this is an old question, but only recently having solved a similar problem, I decided to share my solution.

I had a file that was prepared by one program that I needed to import into the sqlite3 database, but the text file was always "ANSI", and sqlite3 required UTF-8.

ANSI encoding is recognized as "mbcs" in python, and so the code I used breaks something else that I found:

 blockSize = 1048576 with codecs.open("your ANSI source file.txt","r",encoding="mbcs") as sourceFile: with codecs.open("Your UTF-8 output file.txt","w",encoding="UTF-8") as targetFile: while True: contents = sourceFile.read(blockSize) if not contents: break targetFile.write(contents) 

The link below contains some information about the encoding types that I found in my research

https://docs.python.org/2.4/lib/standard-encodings.html

0
source

All Articles