How to set parameters for a custom PySpark Transformer, if this is a stage in an equipped ML-line?

Question

How to set parameters for a custom PySpark Transformer, if this is a stage in an equipped ML-line?

I wrote a custom ML Pipeline Estimatorand Transformerfor my own Python algorithm, following the picture here .

However, in this example, all the parameters necessary for _transform()were conveniently transferred to Model / Transformer by the evaluation method _fit(). But my transformer has several parameters that control how the conversion is applied. These parameters are specific to the transformer, so it would be strange to pass them to the evaluator in advance along with the specific parameters for the evaluator used to fit the model.

I can get around this by adding extra Paramsto the transformer. This works great when I use my grade and transformer outside the ML trunk. But how can I set these parameters depending on the transformer as soon as the evaluation object is added as a stage for the conveyor? For example, you can call getStages()on pyspark.ml.pipeline.Pipelineand, therefore, get estimates, but PipelineModelthere is no corresponding method getStages(). I do not see any methods for setting parameters in stages PipelineModel.

So, how can I set the parameters on my transformer before I call transform()the mounted pipeline model? I am on Spark 2.2.0.

0

apache-spark pyspark apache-spark-ml

snark Feb 06 '18 at 12:15

source share

1

snark · Accepted Answer · 2018-03-06T12:20:11+0000

PipelineModel getStages(), stages.

, , , - :

myModel = myPipelineModel.stages[1]
myModel.setMyParam(42)
# Or in one line:
#myPipelineModel.stages[1].setMyParam(42)

# Now we can push our data through the fully configured pipeline model:
resultsDF = myPipelineModel.transform(inputDF)

How to set parameters for a custom PySpark Transformer, if this is a stage in an equipped ML-line?

More articles: