In TensorFlow or Theano, you tell the library how your neural network works and how the forward function should work.
For example, in TensorFlow you should write:
with graph.as_default(): _X = tf.constant(X) _y = tf.constant(y) hidden = 20 w0 = tf.Variable(tf.truncated_normal([X.shape[1], hidden])) b0 = tf.Variable(tf.truncated_normal([hidden])) h = tf.nn.softmax(tf.matmul(_X, w0) + b0) w1 = tf.Variable(tf.truncated_normal([hidden, 1])) b1 = tf.Variable(tf.truncated_normal([1])) yp = tf.nn.softmax(tf.matmul(h, w1) + b1) loss = tf.reduce_mean(0.5*tf.square(yp - _y)) optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
I use the L2-norm loss function, C = 0.5 * sum ((y-yp) ^ 2), and at the backpropagation stage, the derivative, dC = sum (y-yp), is supposed to be calculated. See (30) in this book .
My question is: how does TensorFlow (or Theano) know the analytic derivative for backpropagation? Or are they making an approximation? Or somehow not use the derivative?
I took an in-depth study of udacity at TensorFlow, but I still disagree on how to understand how these libraries work.
source share