TStringList behavior with files without ANSI

In my application, when I want to import a file, I use TStringList.

But when someone exports data from Excel, the file encoding is UCS-2 Little Endian, and TStringList cannot read the data.

Is there a way to check this situation, determine the text encoding and send a warning to the user that the provided text is incompatible?

To be clear, the user will provide only plain text and numbers, otherwise I must send a warning.

A Unicode File without a specification is good. (TStringList can read it!)
ANSI file too. (TStringList can read it!)
Even Unicode with BOM would be nice if there is a way to remove it. (TStringList can read it !, but with the characters "i" and "reverse?" Which belong to the BOM bytes)

+4
source share
1 answer

In Delphi 6, I used the following function to discover Unicode specifications.

const //standard byte order marks (BOMs) UTF8BOM: array [0..2] of AnsiChar = #$EF#$BB#$BF; UTF16LittleEndianBOM: array [0..1] of AnsiChar = #$FF#$FE; UTF16BigEndianBOM: array [0..1] of AnsiChar = #$FE#$FF; UTF32LittleEndianBOM: array [0..3] of AnsiChar = #$FF#$FE#$00#$00; UTF32BigEndianBOM: array [0..3] of AnsiChar = #$00#$00#$FE#$FF; function FileHasUnicodeBOM(const FileName: string): Boolean; var Buffer: array [0..3] of AnsiChar; Stream: TFileStream; begin Stream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite); // Allow other programs read access at the same time. Try FillChar(Buffer, SizeOf(Buffer), $AA);//fill with characters that we are not expecting then... Stream.Read(Buffer, SizeOf(Buffer)); //...read up to SizeOf(Buffer) bytes - there may not be enough //use Read rather than ReadBuffer so the no exception is raised if we can't fill Buffer Finally FreeAndNil(Stream); End; Result := CompareMem(@UTF8BOM, @Buffer, SizeOf(UTF8BOM)) or CompareMem(@UTF16LittleEndianBOM, @Buffer, SizeOf(UTF16LittleEndianBOM)) or CompareMem(@UTF16BigEndianBOM, @Buffer, SizeOf(UTF16BigEndianBOM)) or CompareMem(@UTF32LittleEndianBOM, @Buffer, SizeOf(UTF32LittleEndianBOM)) or CompareMem(@UTF32BigEndianBOM, @Buffer, SizeOf(UTF32BigEndianBOM)); end; 

This will detect all standard specifications. You can use it to lock such files if you need this behavior.

You declare that Delphi 6 TStringList can load 16-bit encoded files if they do not have a specification. Although this may be the case, you will find that for characters in the ASCII range, each other character is #0 . I think this is not what you want.

If you want this text to be Unicode for files without specifications, you can use IsTextUnicode . However, it can give false positives. This is a situation where I suspect that it is better to ask for forgiveness than permission.

Now, if I were you, I would not try to block Unicode files. I would read them. Use the TNT Unicode library. The class you want is called TWideStringList .

+7
source

All Articles