Intuitive look
In order to correctly answer this question, we must first establish what we mean when we say “offset value”, as is done in the question. Neural networks, as a rule, are intuitively scanned (and explained to beginners) as a network of nodes (neurons) and weighted, directed connections between nodes. In this view, slopes are very often drawn as additional “input” nodes, which always have an activation level of exactly 1.0 . This value of 1.0 may be what some people think when they hear the “Bias Value”. Such a Node offset will have connections to other nodes with trained weights. Other people may think of such scales as “Bias Values”. Since the question was tagged bias-neuron , I will answer the question under the assumption that we use the first definition, for example. Offset value = 1.0 for some Node / neuron offset.
From this point of view ... it is absolutely not mathematically important how many nodes / values of Bias we put in our network, if we are sure to connect them to the correct nodes. You can intuitively think that the entire network has only one Node offset with a value of 1.0 , which does not apply to any particular layer and has connections to all nodes other than the input nodes. This can be difficult to do, although if you want to draw a drawing of your neural network, it may be more convenient to place a separate Node offset (each with a value of 1.0 ) in each layer, except for the output level, and connect each of these offset nodes to all nodes in layer immediately after it. Mathematically, these two interpretations are equivalent, because in both cases each non-incoming Node has an incoming weighted connection from Node, which always has an activation level of 1.0 .
Type of programming
When neural networks are programmed, there are usually no explicit Node '' objects '' (at least in efficient implementations). Usually there will be only matrices for weights. From this point of view there is no more choice. We will (almost) always want one “shifted weight” (weight multiplied by a constant activation level of 1.0 ) going to each node non-input, and we will need to make sure that all these weights appear in the right places in our weight matrices.
Dennis soemers
source share