Machine learning algorithm for data classification.

I am looking for some recommendations about which methods / algorithms I should research in order to solve the following problem. I currently have an algorithm that groups similar mp3 sound files using acoustic fingerprints. In each cluster, I have all the different metadata (song / artist / album) for each file. For this cluster, I would like to select the "best" song / artist / album metadata that matches an existing row in my database, or if there is no better match, decide to insert a new row.

There are usually some correct metadata for a cluster, but individual files have many types of problems:

  • Artist / songs are completely incorrectly named or just slightly erroneous.
  • missing artist / song / album, but the rest of the information is
  • the song is actually a live recording, but only some of the files in the cluster are marked as such.
  • there may be very little metadata, in some cases just the file name, which may be artist - song.mp3, or artist - album - song.mp3, or another change

A simple voting algorithm works well enough, but I would like to have something that I can teach a large set of data that can bring up more nuances than what I have now. We will be very grateful for any links to documents or similar projects.

Thank!

+5
source share
2 answers

, .. , "" .

. , ( , , ), . , , .

, , . .

+3

All Articles