Neural Net Bias per Layer or per Node (no node input)

I am looking to implement a common Neural Net, with 1 input layer consisting of input nodes, 1 output layer consisting of output nodes and N hidden layers consisting of hidden nodes. Nodes are organized in layers, with the rule that Nodes on a single layer cannot be connected.

I basically understand the concept of bias, and my question is this:

Should there be one offset value per level (shared by all nodes in this layer) or does each Node (except nodes on the input layer) have its own offset value?

I have a feeling that this can be done in both directions, and I would like to understand the compromises of each approach, as well as to find out which implementation is most often used.

+7
artificial-intelligence neural-network
source share
1 answer

Intuitive look

In order to correctly answer this question, we must first establish what we mean when we say “offset value”, as is done in the question. Neural networks, as a rule, are intuitively scanned (and explained to beginners) as a network of nodes (neurons) and weighted, directed connections between nodes. In this view, slopes are very often drawn as additional “input” nodes, which always have an activation level of exactly 1.0 . This value of 1.0 may be what some people think when they hear the “Bias Value”. Such a Node offset will have connections to other nodes with trained weights. Other people may think of such scales as “Bias Values”. Since the question was tagged bias-neuron , I will answer the question under the assumption that we use the first definition, for example. Offset value = 1.0 for some Node / neuron offset.

From this point of view ... it is absolutely not mathematically important how many nodes / values ​​of Bias we put in our network, if we are sure to connect them to the correct nodes. You can intuitively think that the entire network has only one Node offset with a value of 1.0 , which does not apply to any particular layer and has connections to all nodes other than the input nodes. This can be difficult to do, although if you want to draw a drawing of your neural network, it may be more convenient to place a separate Node offset (each with a value of 1.0 ) in each layer, except for the output level, and connect each of these offset nodes to all nodes in layer immediately after it. Mathematically, these two interpretations are equivalent, because in both cases each non-incoming Node has an incoming weighted connection from Node, which always has an activation level of 1.0 .

Type of programming

When neural networks are programmed, there are usually no explicit Node '' objects '' (at least in efficient implementations). Usually there will be only matrices for weights. From this point of view there is no more choice. We will (almost) always want one “shifted weight” (weight multiplied by a constant activation level of 1.0 ) going to each node non-input, and we will need to make sure that all these weights appear in the right places in our weight matrices.

+1
source share

All Articles