Call function from scan in Theano

I need to execute the anano function several times with a scan to sum the cost function and use it when calculating the gradient. I am familiar with deep learning tutorials that do this, but my data processing and some other complications mean that I need to do this a little differently. Below is a simplified version of what I'm trying to do.

tn = testnet() cost = tn.single_cost( ) x = theano.shared(numpy.asarray([7.1,2.2,3.4], dtype='float32')) index = T.lscalar('index') test_fn = theano.function(inputs=[index], outputs=cost, givens={tn.x:x[index:index+1]} ) def step(curr): return T.constant( test_fn( curr ) ) outs,_ = theano.scan(step, T.arange(2)) out_fn = theano.function(inputs=[], outputs=outs) print out_fn() 

In the scan function, calling test_fn (curr) gives an error ... An expected object that looks like an array, but found a variable: maybe you are trying to call a function in a (possibly common) variable instead of a numeric array? ')

Even if I pass in an array of values โ€‹โ€‹instead of putting T.arrange (2) in place, I still get the same error. Is there a reason why you cannot call a function from a scan?

In general, I wonder if there is a way to call a function like this using a series of indices so that the output can be loaded into the calculation of T.grad () (not shown).

+5
source share
3 answers

Do not do two different theano.functions .

A theano.function takes a symbolic relation, optimizes it and compiles. What you are doing here asks theano.scan (and therefore out_fn ) to treat the compiled function as a symbolic relation. If you could technically get this at work, I'm not sure, but that contradicts Theano's idea.

Since I donโ€™t know what your cost function does here, I cannot give an exact example, but here is a quick example that works and should be fairly similar to what I think you are trying to do.

 x = theano.shared(np.asarray([7.1,2.2,3.4], dtype = np.float32)) v = T.vector("v") def fv(v): res,_ = theano.scan(lambda x: x ** 2, v) return T.sum(res) def f(i): return fv(x[i:i+2]) outs,_ = theano.scan( f, T.arange(2) ) fn = theano.function( [], outs, ) fn() 
+3
source

After some research, I agree that calling a function from a function is incorrect. The problem with the code is that, based on the basic design of deep learning textbooks, the first level of the network has a symbolic variable defined as input, and the output extends to higher levels until the final cost is calculated on top of the layer. The textbooks use code similar to ...

 class layer1(object): def __init__(self): self.x = T.matrix() self.output = activation(T.dot(self.x,self.W) + self.b) 

For me, the tensor variable (layer1.self.x) should change every time the scan takes a step to have a new piece of data. The givens operator in the function does this, but since calling the compiled anano function from within the scan does not work, there are two other solutions that I could find ...

1 - Correct the network so that its cost function is based on a series of function calls instead of the transmitted variable. It is technically simple, but requires a bit of re-coding in order to properly organize things on a tiered network.

2 - Use theano.clone inside the scan. This code looks something like this ...

 def step(curr): y_in = y[curr] replaces = {tn.layer1.x : x[curr:curr+1]} fn = theano.clone(tn.cost(y_in), replace=replaces) return fn outs,_ = theano.scan(step, sequences=[T.arange(batch_start,batch_end)]) 

Both methods return the same results and appear at the same speed.

+1
source

Decision

The standard way is OpFromGraph (since 0.8.2)

 import theano as th import theano.tensor as T x = T.scalar('x') y = T.scalar('y') z = x+y # unlike theano.function, must use list for outputs op_add = th.OpFromGraph([x,y], [z]) def my_add(x_, y_): return op_add(x_, y_)[0] x_list = T.vector('x_li') x_sum = th.scan(op_add, sequences=[x_list], outputs_info=[T.constant(0.)]) fn_sum = th.function([x_list], x_sum) fn([1., 2., 3., 4.]) # 10. 

What is he doing?

OpFromGraph compiles a function defined from a chart, then packs it into a new Op. In the same way as the definition of functions in imperative programming languages.

Pros / cons

  • [+] It can be convenient in complex models.
  • [+] Saves compilation time. You can compile the commonly used part of a large model in OpFromGraph , and then use it in a larger model. The final schedule will have fewer nodes than the direct implementation.
  • [-] This leads to poor performance at runtime. A function call has overhead, and the compiler cannot perform global optimization due to its compiled nature.
  • [-] This is premature and is still under development. The documentation for it is incomplete. It currently does not support updates and givens , as in theano.function .

Notes

In most cases, you should define python functions / classes to build the model. Use OpFromGraph only if a workaround is not possible or you want to save compilation time.

0
source

Source: https://habr.com/ru/post/1216575/


All Articles