Text Encoding in ID3v2.3 Tags

Thanks to this site and several others, I created a simple code for reading ID3v2.3 tags from MP3 files. This was a great learning experience, as I previously did not know about hex / byte, etc.

I can successfully read the data, but I ran into a problem that I believe is related to the encoding used. I realized that text frames have a byte at the beginning of the “text” that describes the encoding used, and potentially more information in the next two bytes ...

Example: Data from a TIT2 frame begins with byte $ 03 (hex) before the actual text. This text is displayed correctly, albeit with an extra character at the beginning, using Encoding.ASCII.GetString

In another MP3, data from TIT2 starts at $ 01, followed by $ FF $ FE, which I believe is related to Unicode? The text itself is broken, although there is $ 00 between each text character, and this stops the data from being displayed in windows (as soon as 00 is encountered, the text just stops, so I get the first character and that it is). I tried using Encoding.UNICODE.GetString, but that seems to return gibberish.

It seems that printing this data on the console works with spaces between each char, so reading the data works correctly.

I read the official documentation for ID3v2.3, but I think I'm just not good enough to understand the text encoding section.

Any answers or links to articles that may be helpful would be greatly appreciated!

Relations Ross

+5
source share
3

TIT2 $03 (hex) . , , Encoding.ASCII.GetString

0x03 - UTF-8, Encoding.UTF8.GetString. U + FEFF Byte Order Mark, UTF-16LE UTF-16BE... UTF-8, Windows- .

UTF-8 - ID3v2.4, 2.3, . ID3 .

TIT2 $01, $FF $FE, , , Unicode? , $00,

UTF-16LE, " ", Windows "Unicode". , U + 0000-U + 00FF , . 0xFF-0xFE . Encoding.Unicode.GetString - ?

, ,

, ASCII Windows, , , , , . ​​

0x02 UTF-16BE ( , ), 0x00 ISO-8859-1, ASCII-superset, Windows ANSI, Encoding.GetEncoding(1252), , 8859-1.

+3

:

00 - ISO-8859-1 (ASCII).

01 - UCS-2 ( UTF-16 Unicode ), ID3v2.2 ID3v2.3.

02 - UTF-16BE Unicode , ID3v2.4.

03 - , UTF-8, ID3v2.4.

: http://en.wikipedia.org/wiki/ID3

+5

, Unicode ASCII ()!

One question: I was expecting it to Encoding.UNICODE.GetString()handle the specification, but that doesn't seem to be the case. I suppose you need to read these bytes and process the data yourself? I just deleted 2 bytes if it is UNICODE below.

public class Frame
{
    FrameHeader _header;
    public string data;
    public string name;


    public Frame(FrameHeader frm, byte[] bytes)
    {
        _header = frm;
        name = _header._name;
        if (!name.Equals("APIC"))
        {
            byte[] actualdata;
            int y;
            int x;
            int encoding = bytes[0];

            if (encoding.Equals(1))
            {
                y = 3;
                actualdata = new byte[bytes.Length - 3];
                for (x = 0; x < (bytes.Length - 3); x++, y++)
                    actualdata[x] = bytes[y];
                data = Encoding.Unicode.GetString(actualdata);
            }
            else
            {
                y = 1;
                actualdata = new byte[bytes.Length - 1];
                for (x = 0; x < (bytes.Length - 1); x++, y++)
                    actualdata[x] = bytes[y];
                data = Encoding.ASCII.GetString(actualdata);
            }
        }
    }
}
0
source

All Articles