After some research, I agree that calling a function from a function is incorrect. The problem with the code is that, based on the basic design of deep learning textbooks, the first level of the network has a symbolic variable defined as input, and the output extends to higher levels until the final cost is calculated on top of the layer. The textbooks use code similar to ...
class layer1(object): def __init__(self): self.x = T.matrix() self.output = activation(T.dot(self.x,self.W) + self.b)
For me, the tensor variable (layer1.self.x) should change every time the scan takes a step to have a new piece of data. The givens operator in the function does this, but since calling the compiled anano function from within the scan does not work, there are two other solutions that I could find ...
1 - Correct the network so that its cost function is based on a series of function calls instead of the transmitted variable. It is technically simple, but requires a bit of re-coding in order to properly organize things on a tiered network.
2 - Use theano.clone inside the scan. This code looks something like this ...
def step(curr): y_in = y[curr] replaces = {tn.layer1.x : x[curr:curr+1]} fn = theano.clone(tn.cost(y_in), replace=replaces) return fn outs,_ = theano.scan(step, sequences=[T.arange(batch_start,batch_end)])
Both methods return the same results and appear at the same speed.
source share