Custom loss feature in PyTorch

I have three simple questions.

  • What happens if my custom loss function is not differentiable? Will pytorch through error or do something else?
  • If I declare a loss variable in my custom function that will represent the final loss of the model, should I put requires_grad = True for this variable? or does it not matter? If that doesn't matter, then why?
  • I have seen people sometimes write a separate layer and calculate the loss of the forward function. Which approach is preferable, writing a function or layer? Why?

I need a clear and pleasant explanation of these issues to resolve my perplexities. Please, help.

+7
pytorch
source share
1 answer

Let me go.

  • It depends on what you mean by "undifferentiable." The first definition that makes sense here is that PyTorch does not know how to calculate gradients. However, if you try to calculate the gradients, this will cause an error. Two scenarios are possible:

    a) You are using a custom PyTorch operation for which gradients have not been implemented, for example. torch.svd() . In this case, you will get a TypeError :

     import torch from torch.autograd import Function from torch.autograd import Variable A = Variable(torch.randn(10,10), requires_grad=True) u, s, v = torch.svd(A) # raises TypeError 

    b) You performed your own operation but did not define backward() . In this case, you get a NotImplementedError :

     class my_function(Function): # forgot to define backward() def forward(self, x): return 2 * x A = Variable(torch.randn(10,10)) B = my_function()(A) C = torch.sum(B) C.backward() # will raise NotImplementedError 

    The second definition, which makes sense, is "mathematically non-differentiable." Obviously, an operation that is mathematically non-differentiable should either not have an implemented backward() method or a reasonable sub-gradient. Consider torch.abs() , for example, the torch.abs() method returns the subgradient 0 to 0:

     A = Variable(torch.Tensor([-1,0,1]),requires_grad=True) B = torch.abs(A) B.backward(torch.Tensor([1,1,1])) A.grad.data 

    In these cases, you should directly access the PyTorch documentation and directly dig out the backward() method for the corresponding operation.

  • It does not matter. Using requires_grad is to avoid unnecessary gradient calculations for subgraphs. If theres one input to an operation that requires a gradient, its output will also require a gradient. Conversely, only if all inputs do not require a gradient, the output also does not require this. Inverse calculations are never performed in subgraphs where all variables do not require gradients.

    Since most likely some Variables (e.g. subclass parameters of nn.Module() ), your loss variable will also require gradients automatically. However, you should notice that just for how requires_grad works (see above), you can change requires_grad only for leaf variables of your chart.

  • All PyTorch custom loss functions are subclasses of _Loss , which are a subclass of nn.Module . See here. If you want to adhere to this convention, you must subclass _Loss when defining your custom loss function. Besides consistency, one of the advantages is that your subclass raises an AssertionError if you don't mark the target variables as volatile or requires_grad = False . Another advantage is that you can embed the loss function in nn.Sequential() because its a nn.Module I would recommend this approach for these reasons.

+7
source share

All Articles