Is there a way to check file encoding using JavaScript?

Here is my case: I am working with a very large project that contains many files. Some of these files are encoded in UTF-8, others in ANSI. We need to convert all the files to UTF-8, because we decided that this would be the default in our next projects. This is a big problem because we are Brazilian and we have common words using characters like Γ‘, Γ§, Γͺ, ΓΌ etc. Thus, having multiple files in multiple encoding encodings poses a serious problem.

Anyway, I came to this JS file, which converts ANSI files to UTF-8, copying them to another folder and saving the originals:

var indir = "in"; var outdir = "out"; function ansiToUtf8(fin, fout) { var ansi = WScript.CreateObject("ADODB.Stream"); ansi.Open(); ansi.Charset = "x-ansi"; ansi.LoadFromFile(fin); var utf8 = WScript.CreateObject("ADODB.Stream"); utf8.Open(); utf8.Charset = "UTF-8"; utf8.WriteText(ansi.ReadText()); utf8.SaveToFile(fout, 2 /*adSaveCreateOverWrite*/); ansi.Close(); utf8.Close(); } var fso = WScript.CreateObject("Scripting.FileSystemObject"); var folder = fso.GetFolder(indir); var fc = new Enumerator(folder.files); for (; !fc.atEnd(); fc.moveNext()) { var file = fc.item(); ansiToUtf8(indir+"\\"+file.name, outdir+"\\"+file.name); } 

which I run using this on the command line

cscript / Nologo ansi2utf8.js

The problem is that this script goes through all the files, even those that are already in UTF-8, and this breaks my special characters. So I need to check if the file encoding is already UTF-8, and run my code only if it is ANSI. How can i do this?

In addition, my script only works through the "in" folder. I still think this is an easy way to make it go into the folders that are in this folder and work there too.

+4
source share
1 answer

Do you have a byte order character in your UTF-8 files? In this case, you can simply check the value of the first 3 bytes to determine if the files are UTF-8 or not. Otherwise, the standard method is to check if the UTF-8 file is completed to the end, if it is most likely to be read as UTF-8.

+2
source

All Articles