How to do multi-label classification in Apache Spark

I want to do a multi-label text classification on a large dataset, and it looks like big data machine learning tools like Apache Mahout or Spark MLLib do not currently support this. I would like to know if anyone has done a multi-label classification for large datasets before? Is there any plan to integrate multi-label classification in Mahout or Spark in the near future?

+5
source share
1 answer

This document discusses the nature of the benefits you will receive from multi-channel forecasting, namely:

  1. The ability to take into account several independent input parameters when forecasting, instead of constantly updating your metrics for each nth index that you are trying to make as part of this forecast.
  2. Computing speed increased.

Based on your needs, I would recommend that you try to reduce the selection to a smaller group for your current problem, and then create several models based on custom groups in your dataset if the performance does not match what you are looking for.

I still face this challenge myself (4 years from your post ...).

Here is a list of useful articles I have compiled while trying to solve this problem:

+1

All Articles