Is a string in the normal compatibility form already in the corresponding canonical normal form?

My tests tell me that with Unicode 6.2, all characters in fully compatible decompositions have the NFD_Quick_Check = Yes property.

This makes me think that isNFKD (x) means isNFD (x), and isNFKC (x) means isNFC (x).

Are my conclusions right? What about stability? Are these consequences guaranteed for future versions of the Unicode standard?

+6
source share
2 answers

Your conclusions are correct. The Unicode Standard Design Objective Section No. 15 reads as follows:

toNFKC(x) = toNFC(toNFKC(x)) toNFKD(x) = toNFD(toNFKD(x)) 

Regarding stability, this will be true for future versions of Unicode if the normalized string does not contain unassigned code points.

+1
source

I found the following here :

In other words, the composite phase of NFC and NFKC is the same - only their decomposition phase is different , while NFKC uses compatibility decompositions.

Then there is the following:

There are two forms of normalization that are converted to compound characters: the normalization form C and the normalization form KC. The difference between them depends on whether the resulting text should be the canonical equivalent of the original unnormalized text or compatibility equivalent to the original unnormalized text. (In NFKC and NFKD, K is used for compatibility purposes, to avoid confusion with C for the composition.) Both types of normalization can be useful in different circumstances.

In the first three figures, the NFKD form always matches the NFD form, and the NFKC form always matches the NFC form, therefore, for simplicity, these columns are omitted.

This is what I could choose from a text that can shed light on at least part of your question. Hope this helps.

There is also this table in the Wikipedia article :

NFD Normalization Form Canonical decomposition: The characters are decomposed by canonical equivalence, and several combination characters are arranged in a specific order.

NFC Form of rationing Canonical composition: symbols are decomposed and then recomposed by canonical equivalence.

NFKD Compatibility Normalization Formats Decomposition: characters are decomposed using compatibility, and several combining characters are arranged in a specific order.

NFKC Normalization of the compatibility form. Composition: characters are decomposed using compatibility and then reordered by canonical equivalence.

Looking at the explanations of what it is, I don’t think you can conclude that each implies another. NFD is degraded by canonical equivalence, while NFKD is decomposed by compatibility.

The same article also says:

equivalence criteria can be either canonical (NF) or compatible (NFK).

For me, this means that it is either canonical or compatible. NFD and NFKD do different things.


In this introduction note, the article says:

For all versions, even prior to Unicode 4.1, the following policy applies:

Normalized string is guaranteed to be stable; that is, after normalization, the string is normalized in accordance with all future versions of Unicode.

0
source

All Articles