Extract ID3 tags from partial download MP3 url using python

I need to extract ID3 tags and metadata of deleted mp3 files.

I wrote a few lines that could get the ID3 tags of the local file:

from mutagen.mp3 import MP3 import urllib2 audio = MP3("Whistle.mp3") songtitle = audio["TIT2"] artist = audio["TPE1"] print "Title: " + str(songtitle) print "Artist: "+str(artist) 

I need to achieve this for links to links for mp3 files. I tried to get partial file upload using urllib2.

 import urllib2 from mutagen.mp3 import MP3 req = urllib2.Request('http://www.1songday.com/wp-content/uploads/2013/08/Lorde-Royals.mp3') req.headers['Range'] = 'bytes=%s-%s' % (0, 100) response = urllib2.urlopen(req) headers = response.info() print headers.type print headers.maintype data = response.read() print len(data) 

How can I extract ID3 tags from an MP3 URL without fully downloading a file?

+7
python metadata id3 mutagen
source share
2 answers

In your example, ID3 tags are not retrieved, so you cannot retrieve them.

I played a little after reading the ID3 spec, and here is a good way to get started.

 #Search for ID3v1 tags import string tagIndex = string.find(data,'TAG') if (tagIndex>0): if data[tagIndex+3]=='+': print "Found extended ID3v1 tag!" title = data[tagIndex+3:tagIndex+63] print title else: print "Found ID3v1 tags" title = data[tagIndex+3:tagIndex+33] print title #So on. else: #Look for ID3v2 tags if 'TCOM' in data: composerIndex = string.find(data,'TCOM') #and so on. See wikipedia for a full list of frame specifications 
0
source share

The id3 tags are stored in the ID3 metadata, which are usually found before mp3 frames (containing audio), but the mp3 standard also allows them to β€œtrack mp3 frames” .

To load the minimum number of bytes you need:

  • download the first 10 bytes of mp3, extract the ID3v2 header and calculate the id3v2 header size
  • to get full id3v2 tags upload size bytes mp3
  • use python library to extract ID3 tags

Here's a script (python 2 or 3) that retrieves the album art with a minimum upload size:

 try: import urllib2 as request # python 2 except ImportError: from urllib import request # python 3 from functools import reduce import sys from io import BytesIO from mutagen.mp3 import MP3 url = sys.argv[1] def get_n_bytes(url, size): req = request.Request(url) req.headers['Range'] = 'bytes=%s-%s' % (0, size-1) response = request.urlopen(req) return response.read() data = get_n_bytes(url, 10) if data[0:3] != 'ID3': raise Exception('ID3 not in front of mp3 file') size_encoded = bytearray(data[-4:]) size = reduce(lambda a,b: a*128+b, size_encoded, 0) header = BytesIO() # mutagen needs one full frame in order to function. Add max frame size data = get_n_bytes(url, size+2881) header.write(data) header.seek(0) f = MP3(header) if f.tags and 'APIC:' in f.tags.keys(): artwork = f.tags['APIC:'].data with open('image.jpg', 'wb') as img: img.write(artwork) 

A few notes:

  • it checks that ID3 is in front of the file and that it is ID3v2
  • id3 tag size is stored in bytes 6 through 9, and is documented on id3.org
  • Unfortunately, a mutagen needs one full mp3 audio frame to parse id3 tags. Therefore, you also need to upload one mp3 frame (which is at a maximum of 2881 bytes according to this comment )
  • instead of blindly assuming the album cover is jpg, you should first check the image format, since id3 allows many different types of images
  • checked about 10 random mp3 files from the internet, for example. this is: python url.py http://www.fuelfriendsblog.com/listenup/01%20America.mp3
0
source share

All Articles