plsql functions for processing arbitrary character sets (well, as far as rdbms is known about them) are located in utl_i18n and utl_raw packages. for your specific problem, I would suggest a test like the following:
select <pk_column_of_table_to_check> , instr ( utl_i18n.string_to_raw ( <column_to_test> , 'UTF8' ) , hextoraw ( <hex_rep_in_utf8> ) ) from <table_to_check> ;
if you want to check for unicode characters whose utf8 representation is not available to you, use the term
utl_raw.convert ( hextoraw ( <hex_rep_in_utf16>, 'UTF16', 'UTF8' ) )
as a second argument to instr. do not rely on the absolute positions returned by instr, but only on the dichotomy 0 / non-0, since you are not comparing in character, but at the byte level.
utf8 and utf16 - 2 different byte level encodings for Unicode character sets in the sense of named character objects; details can be found on wikipedia and unicode.org
note that the utf8 view allows you to run byte tests at the byte level by design. also note that utf16 encoding should be easily accessible, as this is a familiar representation of U + <4 hex digit> for Unicode characters.
the byte level representation of the incriminated characters should be accessible from the standard (xml). otherwise, you should have an idea of ββwhat char is called and look at the code point database at unicodde.org or aomeweher else. there are also online conversion tools, if you know only the encoding name, but have a sample text in a file in my system, I can find uris if you need to.
Hope this helps.
ps: after a clearer reading of your first comment, I think that you may find yourself in a mission impossible: correctly interpret sequences of bytes from single-byte encodings of the encoding that are necessary to store information about the encoding used. Will this information be lost when the user copies the text from the word processor (set to a specific encoding [encoding]) to the database (where it will be stored in the database character set) as soon as the sequence of bytes is copied? you may be lucky when both ends are set to Unicode, and the db encoding is utf8 (so some character copying will fail), but once the data is in the database, it will be difficult for you to restore the original (possibly with dictionary support)