Deep learning - a series of naive questions about coffee

I am trying to understand the basics of caffe, in particular to use with python.

My understanding is that the model definition (for example, a given neural network architecture) should be included in the '.prototxt' file.

And when you train the model on the data using '.prototxt' , you save the weight / model parameters in the '.caffemodel' file

In addition, there is a difference between the '.prototxt' file used for training (which includes the training parameters and regularization parameters) and those used for testing / deployment that do not include them.

Questions:

  • Is it correct that '.prototxt' is the basis for training and that '.caffemodel' is the result of training (weights) using '.prototxt' according to the training data?
  • Is it right that there exists '.prototxt' for training, and for testing, and that there are only slight differences (learning speed and regularization factors in training), but that nn architecture (provided that you use neural networks) is the same?

Apologies for such basic questions and perhaps some very incorrect assumptions, I am doing some online research, and the above lines summarize my understanding so far.

+7
python deep-learning neural-network caffe pycaffe
source share
2 answers

Let's look at one of the examples given in BVLC / caffe: bvlc_reference_caffenet .
You will notice that there are actually 3 '.prototxt' files:

  • train_val.prototxt : this file describes the network architecture for the training phase.
  • depoly.prototxt : this file describes the network architecture for testing time ("deployment").
  • solver.prototxt : this file is very small and contains "meta-parameters" for training. For example, a course of study , regulariztion , etc.

The clean architecture provided by train_val.prototxt and deploy.prototxt should be basically similar. There is a slight difference between the two:

  • Input data: during training, a predefined set of inputs for training / verification is usually used. Therefore, train_val usually contains an explicit input layer, for example, the "HDF5Data" or "Data" layer. On the other hand, deploy usually does not know in advance what input it will receive; it contains only an instruction:

     input: "data" input_shape { dim: 10 dim: 3 dim: 227 dim: 227 } 

    which announces what input the network expects, and what its size should be.
    Alternatively, you can add an "Input" layer:

     layer { name: "input" type: "Input" top: "data" input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } } } 
  • Entrance labels: during training we deliver a network with the expected outputs "ground true", this information is obviously not available during deploy .
  • Levels of losses: during training, you need to determine the level of losses. This layer tells the solver in which direction it should adjust the parameters at each iteration. This loss compares the pure current prediction with the expected "truth of truth." The loss gradient extends back to the rest of the network, and this is what stimulates the learning process. During deploy there are no losses and no backpropagation.

In caffe, you specify train_val.prototxt description of the network, train / shaft datasets, and loss. In addition, you provide solver.prototxt describing the meta parameters for training. The result of the training process is a binary .caffemodel file containing prepared network parameters.
After the network has been trained, you can use deploy.prototxt with .caffemodel parameters to predict outputs for new and invisible inputs.

+12
source share

Yes, but there are different .prototxt files for example

https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_train_test.prototxt

this is for training and testing network

for training on the command line, ypu can use a solver file, which is also a .prototxt file, for example

https://github.com/BVLC/caffe/blob/master/examples/mnist/lenet_solver.prototxt

0
source share

All Articles