For reference, a hard sigmoid function can be defined differently in different places. In Courbariaux et al. 2016 [1] it is defined as:
Ο is the βrigid sigmoidalβ function: Ο (x) = clip ((x + 1) / 2, 0, 1) = max (0, min (1, (x + 1) / 2))
The goal is to provide a probability value (hence it must be between 0 and 1 ) for use in stochastic binarization of neural network parameters (e.g. weight, activation, gradient). You use the probability p = Ο(x) returned from the rigid sigmoid function to set the parameter x to +1 with probability p or -1 with probability 1-p .
[1] https://arxiv.org/abs/1602.02830 - "Binarized neural networks: training deep neural networks with weights and activations tied to +1 or -1", Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El- Yaniv, Yoshua Bengio, (Posted on February 9, 2016 (v1), last modified on March 17, 2016 (this version, version 3))
bobo source share