I tried to parse the lyrics from the largest Russian-language website http://amalgama-lab.com and save the lyrics (translated and original) to the audio list of my Vkontakte account (unfortunately, amalgama does not have an API)
import urllib from BeautifulSoup import BeautifulSoup import vkontakte vk = vkontakte.API(token=<SECRET_TOKEN>) audios = vk.getAudios(count='2') #{u'artist': u'The Beatles', u'url': u'http://cs4519.vkontakte.ru/u4665445/audio/4241af71a888.mp3', u'title': u'Yesterday', u'lyrics_id': u'2365986', u'duration': 130, u'aid': 166194990, u'owner_id': 173505924} url = 'http://amalgama.mobi/songs/' for i in audios: print i['artist'] if i['artist'].startswith('The '): url += i['artist'][4:5] + '/' + i['artist'][4:].replace(' ', '_') + '/' +i['title'].replace(' ', '_') + '.html' else: url += i['artist'][:1] + '/' + i['artist'].replace(' ', '_') + '/' +i['title'].replace(' ', '_') + '.html' url = url.lower() page = urllib.urlopen(url) soup = BeautifulSoup(page.read(), fromEncoding="utf-8") texts = soup.findAll('ol', ) if len(texts) != 0: en = texts[0].text #this! ru = texts[1].text #this! vk.get('audio.edit', aid=i['aid'], oid = i['owner_id'], artist=i['artist'], title = i['title'], text = ru, no_search = 0)
but .text returns a string without separators:
“Yesterday all my troubles seemed so far away. Now everything looks like they are here to stay. I believe in yesterday. Suddenly I’m not half the person I once was. There a shadow hanging over me suddenly appeared yesterday [Chorus:] Why she had to leave, I don’t know, she wouldn’t say that I said something wrong, now I want it yesterday, love was such an easy game to play, now I need a place to hide. Oh, I believe in "
This is the main problem. Further, what is the best way to save texts this way:
Lyric 1 (Original)
Lyric 1 (translated)
Lyrics 2 (Original)
Lyrics 2 (translated)
Lyrics 3 (Original)
Lyrics 3 (translated)
...
? I only get dirty code. Thanks
just so
source share