How to do Xavier initialization on TensorFlow

I am migrating my Caffe network to TensorFlow, but it does not have xavier initialization. I use truncated_normal , but that seems to make it a lot harder to train.

+51
python tensorflow
Nov 10 '15 at 22:07
source share
6 answers

Starting with version 0.8 there is an Xavier initializer, see here for documents .

You can use something like this:

 W = tf.get_variable("W", shape=[784, 256], initializer=tf.contrib.layers.xavier_initializer()) 
+79
Apr 22 '16 at 4:23
source share

@ Alef7, the initialization of Xavier / Glorot depends on the number of incoming connections (fan_in), the number of outgoing connections (fan_out) and the type of activation function (sigmoid or tanh) of the neuron. See this: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

So now, to your question. Here's how I would do it in TensorFlow:

 (fan_in, fan_out) = ... low = -4*np.sqrt(6.0/(fan_in + fan_out)) # use 4 for sigmoid, 1 for tanh activation high = 4*np.sqrt(6.0/(fan_in + fan_out)) return tf.Variable(tf.random_uniform(shape, minval=low, maxval=high, dtype=tf.float32)) 

Note that we should be a sample from a uniform distribution, not a normal distribution, as suggested in another answer.

By the way, I wrote a post yesterday for something else using TensorFlow, which also uses Xavier initialization. If you're interested, there is also a python laptop with a walk-through example: https://github.com/delip/blog-stuff/blob/master/tensorflow_ufp.ipynb

+11
Nov 14 '15 at 17:37
source share

Just add another example of how to define tf.Variable initialization using the Xavier and Yoshua method :

 graph = tf.Graph() with graph.as_default(): ... initializer = tf.contrib.layers.xavier_initializer() w1 = tf.Variable(initializer(w1_shape)) b1 = tf.Varialbe(initializer(b1_shape)) ... 

This prevented me from having nan values โ€‹โ€‹in my loss function due to numerical instabilities when using multiple layers using RELUs.

+8
Jul 28 '17 at 19:24
source share

A nice wrapper around tensorflow called prettytensor gives an implementation in the source code (copied directly from here ):

 def xavier_init(n_inputs, n_outputs, uniform=True): """Set the parameter initialization using the method described. This method is designed to keep the scale of the gradients roughly the same in all layers. Xavier Glorot and Yoshua Bengio (2010): Understanding the difficulty of training deep feedforward neural networks. International conference on artificial intelligence and statistics. Args: n_inputs: The number of input nodes into each output. n_outputs: The number of output nodes for each input. uniform: If true use a uniform distribution, otherwise use a normal. Returns: An initializer. """ if uniform: # 6 was used in the paper. init_range = math.sqrt(6.0 / (n_inputs + n_outputs)) return tf.random_uniform_initializer(-init_range, init_range) else: # 3 gives us approximately the same limits as above since this repicks # values greater than 2 standard deviations from the mean. stddev = math.sqrt(3.0 / (n_inputs + n_outputs)) return tf.truncated_normal_initializer(stddev=stddev) 
+6
Dec 19 '15 at 10:25
source share

I looked and I could not find anything built in. However, in accordance with this:

http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

Xavier initialization is simply a sample of a (usually Gaussian) distribution, where variance is a function of the number of neurons. tf.random_normal can do this for you, you just need to calculate stddev (i.e. the number of neurons represented by the weight matrix that you are trying to initialize).

+3
Nov 12 '15 at
source share

TF-contrib has xavier_initializer . Here is an example of how to use it:

 import tensorflow as tf a = tf.get_variable("a", shape=[4, 4], initializer=tf.contrib.layers.xavier_initializer()) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print sess.run(a) 

In addition to this, tensor flow has other initializers:

+3
May 01 '17 at 4:00
source share



All Articles