I read this in the next release of a similar linear module: https://github.com/lensacom/sparkit-learn/issues/49
"Unfortunately, this is a mistake. Sparkit sequentially runs linear sklearn models, then averages them at the reduction stage. There is at least one block that contains only one of the labels. To check, try the following:
train_Z[:, 'y']._rdd.map(lambda x: np.unique(x).size).filter(lambda x: x < 2).count()
To solve. You can randomize train data to avoid single-label blocks, but this is still waiting for a smart decision.
EDIT: I found a solution, the above error analysis was correct. That would be a solution.
To shuffle arrays in the same order, I used the scikitlearn utils module:
from sklearn.utils import shuffle X_shuf, Y_shuf = shuffle(X_transformed, Y)
Then use these shuffled arrays to prepare your model again and it will work!
source share