Machine learning algorithm for data classification.

Question

Machine learning algorithm for data classification.

I am looking for some recommendations about which methods / algorithms I should research in order to solve the following problem. I currently have an algorithm that groups similar mp3 sound files using acoustic fingerprints. In each cluster, I have all the different metadata (song / artist / album) for each file. For this cluster, I would like to select the "best" song / artist / album metadata that matches an existing row in my database, or if there is no better match, decide to insert a new row.

There are usually some correct metadata for a cluster, but individual files have many types of problems:

Artist / songs are completely incorrectly named or just slightly erroneous.
missing artist / song / album, but the rest of the information is
the song is actually a live recording, but only some of the files in the cluster are marked as such.
there may be very little metadata, in some cases just the file name, which may be artist - song.mp3, or artist - album - song.mp3, or another change

A simple voting algorithm works well enough, but I would like to have something that I can teach a large set of data that can bring up more nuances than what I have now. We will be very grateful for any links to documents or similar projects.

Thank!

+5

machine-learning classification

twk Jun 03 '10 at 15:49

source share

2 answers

Joel Hoff · Answer 1 · 2010-06-05T17:20:02+0000

, .. , "" .

. , ( , , ), . , , .

, , . .

Tansir1 · Answer 2 · 2010-06-03T16:09:48+0000

- "" . , // .

, . , .

http://en.wikipedia.org/wiki/Levenshtein_distance

Machine learning algorithm for data classification.

More articles: