I have a large matrix.
I use an easier way to create this variable as the number of skulls.
softmax_w = tf.get_variable("softmax_w", [hps.vocab_size, hps.projected_size], partitioner=tf.fixed_size_partitioner(hps.num_shards, 0))
create log:
model/softmax_w/part_0:0 (99184, 512) /cpu:0 model/softmax_w/part_1:0 (99184, 512) /cpu:0 model/softmax_w/part_2:0 (99184, 512) /cpu:0 model/softmax_w/part_3:0 (99184, 512) /cpu:0 model/softmax_w/part_4:0 (99184, 512) /cpu:0 model/softmax_w/part_5:0 (99184, 512) /cpu:0 model/softmax_w/part_6:0 (99183, 512) /cpu:0 model/softmax_w/part_7:0 (99183, 512) /cpu:0
I can train and be successful. But when I try to restore the model, I got this error:
W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_7 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_6 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_5 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_4 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_3 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_2 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_1 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_0 not found in checkpoint W tensorflow/core/framework/op_kernel.cc:975] Not found: Key model/softmax_w/part_7 not found in checkpoint
I found shadoworflow to save a variable as part. The saved parameter has only one softmax_w . No longer a partitioned variable.
python deep-learning machine-learning tensorflow
yi wang
source share