Wow. On the one hand, I am very pleased to know that university courses teach reality, that character encodings are hard work, but in fact, knowing the UTF-8 encoding rules sounds like they expected a lot. (Will this help students pass the Turkish test ?)
The vivid description I've seen so far for UCS code point encoding rules for UTF-8 refers to the utf-8(7) man page on many Linux systems:
Encoding The following byte sequences are used to represent a character. The sequence to be used depends on the UCS code number of the character: 0x00000000 - 0x0000007F: 0xxxxxxx 0x00000080 - 0x000007FF: 110xxxxx 10xxxxxx 0x00000800 - 0x0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx 0x00010000 - 0x001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx [... removed obsolete five and six byte forms ...] The xxx bit positions are filled with the bits of the character code number in binary representation. Only the shortest possible multibyte sequence which can represent the code number of the character can be used. The UCS code values 0xd800โ0xdfff (UTF-16 surrogates) as well as 0xfffe and 0xffff (UCS noncharacters) should not appear in conforming UTF-8 streams.
It might be easier to remember the โcompressedโ version of the diagram:
The initial bytes of the beginning distorted code points begin with 1 and 1+0 gaskets are added. Subsequent bytes begin 10 .
0x80 5 bits, one byte 0x800 4 bits, two bytes 0x10000 3 bits, three bytes
You can get ranges, taking into account how much space you can fill with bits allowed in the new view:
2**(5+1*6) == 2048 == 0x800 2**(4+2*6) == 65536 == 0x10000 2**(3+3*6) == 2097152 == 0x200000
I know that I could remember the rules to make the diagram easier than the diagram itself. Here you hope you remember the rules well. :)
Update
Once you have built the diagram above, you can convert the Unicode input codes to UTF-8 by finding their range, converting from hexadecimal to binary, inserting bits according to the rules above, and then converting back to hex:
U+4E3E
This corresponds to the range 0x00000800 - 0x0000FFFF ( 0x4E3E < 0xFFFF ), so the view will look like:
1110xxxx 10xxxxxx 10xxxxxx
0x4E3E 100111000111110b . Drop the bits at x above (start on the right side, we will fill in the missing bits at the beginning with 0 ):
1110x100 10111000 10111110
At the beginning there is a spot x , which is filled with 0 :
11100100 10111000 10111110
Convert from Bit to Hex :
0xE4 0xB8 0xBE