Higher order derived functions

This is in the context of "Automatic differentiation" - what would such a system perform with a function such as map or filter - or even one of the SKI Combiners ?

Example: I have the following function:

 def func(x): return sum(map(lambda a: a**x, range(20))) 

What would be its derivative? What will the AD system give as a result? (This function is correctly defined on the inputs of the real number).

+4
source share
3 answers

Higher-order functions are discrete. They do not have the Cartesian quality of the arguments, which have well-defined mappings to points in some n-dimensional space.

However, with an explanation of the answer, there are a few things that can be said. It would be possible to differentiate some functions of a higher order symbolically, but only for certain call patterns that allow known functions.

Perhaps numerical differentiation will be more fruitful, since he can evaluate the derivative by re-evaluating this function.

In addition, functions that are completely general, and your example moves in this way using relatively arbitrary functions - you will end up with Turing completeness, which means that no trick on the part of the symbolic differentiator will be able to automatically differentiate the function .

0
source

A good AD system will do this without any difficulty. If the AD code converts the source to source, it may have problems with sum . But if it is an AD system that works when overloading arithmetic operators, then the AD code does not actually β€œsee” the sum function, but only the + operations that the sum function calls.

+3
source

I would like to disagree with the accepted answer that you cannot usefully differentiate functions of a higher rank or that you need to limit yourself to a particularly small subset of them, demonstrating this in practice.

Using my Haskell ad package, we get:

 ghci> :m + Numeric.AD ghci> diff (\x -> sum (map (**x) [1..20])) 10 7.073726805128313e13 

We can extract what was done by abusing the tracked number type to answer your question about the derivative:

 ghci> :m + Debug.Traced ghci> putStrLn $ showAsExp $ diff (\x -> sum (map (**x) [1..20])) (unknown "x" :: Traced Double) 1.0 * (1.0 ** x * log 1.0) + 1.0 * (2.0 ** x * log 2.0) + 1.0 * (3.0 ** x * log 3.0) + 1.0 * (4.0 ** x * log 4.0) + 1.0 * (5.0 ** x * log 5.0) + 1.0 * (6.0 ** x * log 6.0) + 1.0 * (7.0 ** x * log 7.0) + 1.0 * (8.0 ** x * log 8.0) + 1.0 * (9.0 ** x * log 9.0) + 1.0 * (10.0 ** x * log 10.0) + 1.0 * (11.0 ** x * log 11.0) + 1.0 * (12.0 ** x * log 12.0) + 1.0 * (13.0 ** x * log 13.0) + 1.0 * (14.0 ** x * log 14.0) + 1.0 * (15.0 ** x * log 15.0) + 1.0 * (16.0 ** x * log 16.0) + 1.0 * (17.0 ** x * log 17.0) + 1.0 * (18.0 ** x * log 18.0) + 1.0 * (19.0 ** x * log 19.0) + 1.0 * (20.0 ** x * log 20.0) 

When fully shared, you get more terrifying results that are sometimes asymptotically more effective.

 ghci> putStrLn $ showAsExp $ reShare $ diff (\x -> sum (map (**x) [1..20])) (unknown "x" :: Traced Double) let _21 = 1.0 ** x; _23 = log 1.0; _20 = _21 * _23; _19 = 1.0 * _20; _26 = 2.0 ** x; _27 = log 2.0; _25 = _26 * _27; _24 = 1.0 * _25; _18 = _19 + _24; _30 = 3.0 ** x; _31 = log 3.0; _29 = _30 * _31; _28 = 1.0 * _29; _17 = _18 + _28; _34 = 4.0 ** x; _35 = log 4.0; _33 = _34 * _35; _32 = 1.0 * _33; _16 = _17 + _32; _38 = 5.0 ** x; _39 = log 5.0; _37 = _38 * _39; _36 = 1.0 * _37; _15 = _16 + _36; _42 = 6.0 ** x; _43 = log 6.0; _41 = _42 * _43; _40 = 1.0 * _41; _14 = _15 + _40; _46 = 7.0 ** x; _47 = log 7.0; _45 = _46 * _47; _44 = 1.0 * _45; _13 = _14 + _44; _50 = 8.0 ** x; _51 = log 8.0; _49 = _50 * _51; _48 = 1.0 * _49; _12 = _13 + _48; _54 = 9.0 ** x; _55 = log 9.0; _53 = _54 * _55; _52 = 1.0 * _53; _11 = _12 + _52; _58 = 10.0 ** x; _59 = log 10.0; _57 = _58 * _59; _56 = 1.0 * _57; _10 = _11 + _56; _62 = 11.0 ** x; _63 = log 11.0; _61 = _62 * _63; _60 = 1.0 * _61; _9 = _10 + _60; _66 = 12.0 ** x; _67 = log 12.0; _65 = _66 * _67; _64 = 1.0 * _65; _8 = _9 + _64; _70 = 13.0 ** x; _71 = log 13.0; _69 = _70 * _71; _68 = 1.0 * _69; _7 = _8 + _68; _74 = 14.0 ** x; _75 = log 14.0; _73 = _74 * _75; _72 = 1.0 * _73; _6 = _7 + _72; _78 = 15.0 ** x; _79 = log 15.0; _77 = _78 * _79; _76 = 1.0 * _77; _5 = _6 + _76; _82 = 16.0 ** x; _83 = log 16.0; _81 = _82 * _83; _80 = 1.0 * _81; _4 = _5 + _80; _86 = 17.0 ** x; _87 = log 17.0; _85 = _86 * _87; _84 = 1.0 * _85; _3 = _4 + _84; _90 = 18.0 ** x; _91 = log 18.0; _89 = _90 * _91; _88 = 1.0 * _89; _2 = _3 + _88; _94 = 19.0 ** x; _95 = log 19.0; _93 = _94 * _95; _92 = 1.0 * _93; _1 = _2 + _92; _98 = 20.0 ** x; _99 = log 20.0; _97 = _98 * _99; _96 = 1.0 * _97; _0 = _1 + _96; in _0 

In general, automatic differentiation does not have problems with higher rank functions. However, translations from source to source can be launched in several corrections, depending on the limitations of a particular tool.

+3
source

All Articles