I am trying to implement a neural (ish) network in keras with this design: http://nlp.cs.rpi.edu/paper/AAAI15.pdf
The algorithm has essentially three inputs. Input 2 and Input 3 are multiplied by the same weight matrix W1 to obtain O2 and O3. Input 1 gets the multiplication by W2 to get O1. Then we need to take the point product O1 * O2 and O1 * O3.
I am trying to implement this in keras.
My first thought was to use the keras Graph class and make W1 a common node layer with two inputs and two outputs. Still.
Then the problem arises of how to take the point products of these two outputs with O1.
I tried to define a custom function:
def layer_mult(X, Y): return K.dot(X * K.transpose(Y))
Then:
ntm.add_node(Lambda(layer_mult, output_shape = (1,1)), name = "ls_pos", inputs = ["O1", "O2"]) ntm.add_node(Lambda(layer_mult, output_shape = (1,1)), name = "ls_neg", inputs = ["O1", "O3"])
The problem with compilation is that keras only wants to give the Lambda layer one input:
1045 func = types.FunctionType(func, globals()) 1046 if hasattr(self, 'previous'): -> 1047 return func(self.previous.get_output(train)) 1048 else: 1049 return func(self.input) TypeError: layer_mult() takes exactly 2 arguments (1 given)
I thought an alternative would be to use the Merge class, which has dot as the type of merge allowed. But the input layers for the Merge class must be passed to the constructor. Thus, there seems to be no way to get outputs from a common node in Merge to add Merge to Graph .
If I used Sequential containers, I could send them to Merge . But then there would be no way to realize that two layers of Sequential need to share the same weight matrix.
I was thinking of trying to combine O1, O2 and O3 together into one vector as the output level, and then do the multiplication inside the objective function. But this requires that the objective function share the input data, which is not possible in keras (the corresponding Anano functions are not passed to the keras API).
Does anyone know a solution?
EDIT:
I thought I made some progress because I found that shared_node implements dot (even if it is not in the documentation).
So, I got to:
ntm = Graph() ntm.add_input(name='g', input_shape=(300,)) # Vector of 300 units, normally distributed around zero ntm.add_node([pretrained bit], name = "lt", input = "g") # 300 * 128, output = (,128) n_docs = 1000 ntm.add_input("d_pos", input_shape = (n_docs,)) # (,n_docs) ntm.add_input("d_neg", input_shape = (n_docs,)) # (,n_docs) ntm.add_shared_node(Dense(128, activation = "softmax", # weights = pretrained_W1, W_constraint = unitnorm(), W_regularizer = l2(0.001) ), name = "ld", inputs = ["d_pos", "d_neg"], outputs = ["ld_pos", "ld_neg"], merge_mode=None) # n_docs * 128, output = (,128) * 2 ntm.add_shared_node(ActivityRegularization(0,0), #ActivityRegularization is being used as a passthrough - the function of the node is to dot* its inputs name = "ls_pos", inputs = ["lt", "d_pos"], merge_mode = 'dot') # output = (,1) ntm.add_shared_node(ActivityRegularization(0,0), name = "ls_neg", inputs = ["lt", "d_neg"], merge_mode = 'dot') # output = (,1) ntm.add_shared_node(ActivityRegularization(0,0), name = "summed", inputs = ["ls_pos", "ls_neg"], merge_mode = 'sum') # output = (,1) ntm.add_node(ThresholdedReLU(0.5), input = "summed", name = "loss") # output = (,1) ntm.add_output(name = "loss_out", input= "loss") def obj(X, Y): return K.sum(Y) ntm.compile(loss = {'loss_out' : obj}, optimizer = "sgd")
And now the error:
>>> ntm.compile(loss = {'loss_out' : obj}, optimizer = "sgd") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "build/bdist.macosx-10.5-x86_64/egg/keras/models.py", line 602, in compile File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/advanced_activations.py", line 149, in get_output File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 117, in get_input File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 1334, in get_output File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 1282, in get_output_sum File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 1266, in get_output_at File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 730, in get_output File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 117, in get_input File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 1340, in get_output File "build/bdist.macosx-10.5-x86_64/egg/keras/layers/core.py", line 1312, in get_output_dot File "/Volumes/home500/anaconda/envs/[-]/lib/python2.7/site-packages/theano/tensor/var.py", line 360, in dimshuffle pattern) File "/Volumes/home500/anaconda/envs/[-]/lib/python2.7/site-packages/theano/tensor/elemwise.py", line 164, in __init__ (input_broadcastable, new_order)) ValueError: ('You cannot drop a non-broadcastable dimension.', ((False, False, False, False), (0, 'x')))