It depends on what you mean by "undifferentiable." The first definition that makes sense here is that PyTorch does not know how to calculate gradients. However, if you try to calculate the gradients, this will cause an error. Two scenarios are possible:
a) You are using a custom PyTorch operation for which gradients have not been implemented, for example. torch.svd() . In this case, you will get a TypeError :
import torch from torch.autograd import Function from torch.autograd import Variable A = Variable(torch.randn(10,10), requires_grad=True) u, s, v = torch.svd(A)
b) You performed your own operation but did not define backward() . In this case, you get a NotImplementedError :
class my_function(Function):
The second definition, which makes sense, is "mathematically non-differentiable." Obviously, an operation that is mathematically non-differentiable should either not have an implemented backward() method or a reasonable sub-gradient. Consider torch.abs() , for example, the torch.abs() method returns the subgradient 0 to 0:
A = Variable(torch.Tensor([-1,0,1]),requires_grad=True) B = torch.abs(A) B.backward(torch.Tensor([1,1,1])) A.grad.data
In these cases, you should directly access the PyTorch documentation and directly dig out the backward() method for the corresponding operation.
It does not matter. Using requires_grad is to avoid unnecessary gradient calculations for subgraphs. If theres one input to an operation that requires a gradient, its output will also require a gradient. Conversely, only if all inputs do not require a gradient, the output also does not require this. Inverse calculations are never performed in subgraphs where all variables do not require gradients.
Since most likely some Variables (e.g. subclass parameters of nn.Module() ), your loss variable will also require gradients automatically. However, you should notice that just for how requires_grad works (see above), you can change requires_grad only for leaf variables of your chart.