Javascript checks user input for the desired character set (encoding)

The scenario is as follows:

A user copies text from a website using Win-1252 encoding for his character set. This text is then sent to the database that I manage with the ISO-8859-1 character set (this is a subset of Win-1252). Is there a mechanism in Javascript to inform the user that they are trying to insert "invalid" characters into the system? Preference if it can highlight the specified characters.

The general view of this problem is that system A (sending system) has a set of encodings defined as AsubE , and the other system B (receiving system) has a set of encodings defined as BsubE . When BsubE is in the AsubE universe, this is not a problem. The question is when BsubE not a subset of AsubE , how can we verify user input.

+4
source share
2 answers

Since some characters are not defined in the subset, you can use a regular expression to define these ranges:

 function isNotAllowed(char) { return /\x00-\x1f|\x7f-\x9f/.test(char); // 00 to 1f, or 7f to 9f } 

To also highlight characters, it will become more complex, but this function may be the core.

+3
source

There is no way to do this in JavaScript. Fortunately, neither Windows-1252 nor ISO-8859-1 is variable-width encoding, so you can write something, say, in .NET or something that understands character encodings, to make a regular expression to test this .

For example, in .NET you can create an array of bytes with 256 bytes, one for each character, and then use each encoding to get the corresponding string. Find out the differences in these lines, encode them in a regular expression, and there you go.

+1
source

All Articles