I studied and practiced the sklearn library myself. When I competed in Kaggle, I noticed that the provided sample code was used by BaseEstimator from sklearn.base . I do not quite understand how / why BaseEstimator used.
from sklearn.base import BaseEstimator class FeatureMapper: def __init__(self, features): self.features = features #features contains feature_name, column_name, and extractor( which is CountVectorizer) def fit(self, X, y=None): for feature_name, column_name, extractor in self.features: extractor.fit(X[column_name], y) #my question is: is X features? if yes, where is it assigned? or else how can X call column_name by X[column_name]. ...
This is what I usually see on the sklearn tutorial page:
from sklearn import SomeClassifier X = [[0, 0], [1, 1],[2, 2],[3, 3]] Y = [0, 1, 2, 3] clf = SomeClassifier() clf = clf.fit(X, Y)
I could not find a good example or any documentation on the official sklearn page. Although I found the sklearn.base code on github, I would like some examples and an explanation of how they are used.
UPDATE
Here is a link to the sample code: https://github.com/benhamner/JobSalaryPrediction/blob/master/features.py Bugfix: I realized that BaseEstimator used for the SimpleTransform class. I think my first question is - why is it needed? (since it is not used anywhere in the calculation), another question is to determine the correspondence, what is X and how is it assigned? Because usually I see:
def mymethod(self, X, y=None): X=self.features