Why 6 in relu6?

I have cracked the deep feed of NN from scratch into R, and it seems more stable with "hard sigmoid" activations - max (0, min (1, x)) - than ReLU. Trying to port it to TensorFlow and noticed that they do not have a built-in activation function, only relu6, which uses top cropping by 6. Is there a reason for this? (I understand that you could do relu6 (x * 6) / 6, but if the TF guys put 6 for a good reason, I would like to know.) Also, I would like to know if others have hacking problems with ReLU in feed networks (I know about problems with RNN).

+6
source share
2 answers

From this reddit thread :

. , Q- Q.f. , 6, 3 ( 8), 4/5 .f

, 6 - , , , . "" 6, , , 8 , , , .

+7

Tensorflows (https://www.tensorflow.org/api_docs/python/tf/nn/relu6) :

... -, 6, ReLU y = min (max (x, 0), 6). . [8] RELU 6 , . ReLU, n ReLU-n.

http://www.cs.utoronto.ca/~kriz/conv-cifar10-aug2010.pdf

, , n n = 6.

+5

All Articles