BaseEstimator in sklearn.base (Python)

I studied and practiced the sklearn library myself. When I competed in Kaggle, I noticed that the provided sample code was used by BaseEstimator from sklearn.base . I do not quite understand how / why BaseEstimator used.

 from sklearn.base import BaseEstimator class FeatureMapper: def __init__(self, features): self.features = features #features contains feature_name, column_name, and extractor( which is CountVectorizer) def fit(self, X, y=None): for feature_name, column_name, extractor in self.features: extractor.fit(X[column_name], y) #my question is: is X features? if yes, where is it assigned? or else how can X call column_name by X[column_name]. ... 

This is what I usually see on the sklearn tutorial page:

 from sklearn import SomeClassifier X = [[0, 0], [1, 1],[2, 2],[3, 3]] Y = [0, 1, 2, 3] clf = SomeClassifier() clf = clf.fit(X, Y) 

I could not find a good example or any documentation on the official sklearn page. Although I found the sklearn.base code on github, I would like some examples and an explanation of how they are used.

UPDATE

Here is a link to the sample code: https://github.com/benhamner/JobSalaryPrediction/blob/master/features.py Bugfix: I realized that BaseEstimator used for the SimpleTransform class. I think my first question is - why is it needed? (since it is not used anywhere in the calculation), another question is to determine the correspondence, what is X and how is it assigned? Because usually I see:

 def mymethod(self, X, y=None): X=self.features # then do something to X[Column_name] 
+6
source share
1 answer

BaseEstimator provides, among other things, a default implementation for the get_params and set_params , see source code . It is useful to make a model grid using GridSearchCV to automatically configure parameters and behave well with others when they are combined in Pipeline .

+8
source

All Articles