I am studying the use of ICU for processing Unicode strings in my own Node.js module, because it seems to me that v8::String(according to these documents ) there is no C ++ API for this.
As far as I know, V8 expects UTF-16 in ExternalStringResourceand other APIs, so I would like to use ICU to handle UTF-16.
I need:
- Iterate over characters (not just 16-bit code units) of a UTF-16 string
- Report the number of characters (not just 16-bit code blocks) containing the UTF-16 string
So, I looked at the ICU documentation and found UnicodeStringand CharacterIterator. However, it UnicodeStringdoes not have a method fromUTF16, only fromUTF8that fromUTF32.
Another thing I'm not sure about is the constructor UnicodeStringcopying the data that I give them or not? I would very much prefer to use a zero copy approach, where I would just work with an immutable object so that it does not perform any copy operations, just use the buffer that I point to.
I'm also not sure if I can just use UCharIterator(assuming that I can somehow convert UChar*from my UTF-16 strings).
So my question is: How to use ICU for the above purposes?
Thanks in advance for your answers!
source
share