Is a URI scheme or URN namespace known for Unicode characters?

I need to reference a Unicode character with a URI. Following the IANA links, several schemas and namespaces are listed, but nothing is said about identifiers for Unicode characters. Does anyone know if something like this exists?

I was hoping to find something like

  • unicode://U+0394
  • urn:unicode://0394
  • http://unicode.org/unicode/0394

for the Greek capital letter delta ฮ”.

If anyone wonders, this is for a semantic web application that uses URIs as identifiers for concepts, including Unicode character concepts.

+4
source share
2 answers

I am afraid that the URL or URN does not refer to reliable information about the Unicode character as a whole. In the Unicode standard, information about individual characters is partly in the so-called character database (mainly text files in certain formats), partly in code diagrams (PDF files). None of them offer a way to point to a single character. Moreover, the information there is not exhaustive: there are important notes on the information of individual characters scattered according to the standard.

Decodeunicode site has individually addressable elements, such as

http://www.decodeunicode.org/en/u+0394

but its information content varies greatly and is usually very limited. It is not official, and currently it contains only Unicode 5.0.

The site Fileformat.info is much more systematic, but it is also unofficial. It is mainly limited by formal properties and the data output from them, plus comments extracted from code diagrams, plus character typing instructions in Windows, as well as support information in fonts, but that's a lot! Example:

http://www.fileformat.info/info/unicode/char/0394/

+3
source

Well, there is a URL linking to authoritative information in the Unicode database, although it does not describe (as said in another answer) all the information about one particular character.

You have the following URL pointing to the latest Unicode database. This is a simple list of existing valid Unicode characters. Some upcoming characters are missing (ใ‹ฟ), and you should expect it to be volatile.

The content is as follows, which is not as convenient to use as it is.

 $ grep -ai kangaroo UnicodeData.txt -C 7 1F991;SQUID;So;0;ON;;;;;N;;;;; 1F992;GIRAFFE FACE;So;0;ON;;;;;N;;;;; 1F993;ZEBRA FACE;So;0;ON;;;;;N;;;;; 1F994;HEDGEHOG;So;0;ON;;;;;N;;;;; 1F995;SAUROPOD;So;0;ON;;;;;N;;;;; 1F996;T-REX;So;0;ON;;;;;N;;;;; 1F997;CRICKET;So;0;ON;;;;;N;;;;; 1F998;KANGAROO;So;0;ON;;;;;N;;;;; 1F999;LLAMA;So;0;ON;;;;;N;;;;; 1F99A;PEACOCK;So;0;ON;;;;;N;;;;; 1F99B;HIPPOPOTAMUS;So;0;ON;;;;;N;;;;; 1F99C;PARROT;So;0;ON;;;;;N;;;;; 1F99D;RACCOON;So;0;ON;;;;;N;;;;; 1F99E;LOBSTER;So;0;ON;;;;;N;;;;; 1F99F;MOSQUITO;So;0;ON;;;;;N;;;;; 

You can create a โ€œhash-basedโ€ hacker namespace with this suffix, but this is definitely non-standard.

0
source

All Articles