(rewritten for clarity)
There seem to be two parts to your question. One of them is implied and asks what is an algebraic interpretation of forall , and the other asks about the cont / Yoneda transformation, which sdcvvc's answer already covers quite well.
I will try to refer to the algebraic interpretation of forall for you. You note that A -> B is B^A , but I would like to take this step further and expand it to B * B * B * ... * B ( |A| times). Although we do have exponentiation as a notation for repeated multiplication, that is, a more flexible notation ∏ (upper case Pi) representing arbitrary indexed products. Pi has two components: the range of values that we want to multiply, and the expression that we multiply. For example, at the value level, you can express the factorial function as fact i = ∏ [1..i] (λx -> x) .
Returning to the world of types, we can consider the exponential operator in accordance with A -> B ~ B^A as Pi: B^A ~ ∏ A (λ_ -> B) . This suggests that we define an A -ric product B s such that B cannot depend on the chosen A Of course, this is equivalent to a simple exponentiation, but it allows you to move on to cases in which there is a dependency.
In the most general case, we get dependent types, for example, what you see in Agda or Coq: in Agda syntax replicate : Bool -> ((n : Nat) -> Vec Bool n) is one of the possible applications of type Pi, which can be expressed more explicitly as replicate : Bool -> ∏ Nat (Vec Bool) , or further as replicate : ∏ Bool (λ_ -> ∏ Nat (Vec Bool)) .
Note that, as you might expect from the underlying algebra, you can fuse both ∏ in the replicate definition above into one range ∏ in the Cartesian product of the domains: ∏ Bool (\_ -> ∏ Nat (Vec Bool)) is equivalent ∏ (Bool, Nat) (λ(_, n) -> Vec Bool n) exactly the same as at the value level. This is just messy in terms of type theory.
I understand that your question was about polymorphism, so I will focus on dependent types, but they matter: forall in Haskell is roughly equivalent to ∏ with a domain above the type (type) of types, * . Indeed, the functional-like behavior of polymorphism can be observed directly in the core of the GHC, which calls them capital lambdas (Λ). Thus, the polymorphic type is forall a. a -> a forall a. a -> a is actually just ∏ * (Λ a -> (a -> a)) (using the notation Λ now to distinguish between types and values), which can be expanded to an infinite product (Bool -> Bool, Int -> Int, () -> (), (Int -> Bool) -> (Int -> Bool), ...) for each possible type. Activating a type variable simply projects the appropriate element from the * -ary product (or by applying a type function).
Now, for the big part that I missed in my original version of this answer: parametricity. Parametricity can be described in several different ways, but none of those that I know (view types as relationships or (di) naturalness in category theory) really have a very algebraic interpretation. However, for our purposes, this boils down to something rather simple: you cannot map the image to * . I know that GHC allows you to do this at the type level with family types, but you can only cover the final fragment * , so there are always points at which your type family is undefined.
What this means, from the point of view of polymorphism, is that any function of type F that we write in ∏ * F must either be constant (i.e. completely ignore the type that was polymorphic), or pass the type through without changes. Thus, ∏ * (Λ _ -> B) true, since it ignores its argument and corresponds to forall a. B forall a. B Another case is something like ∏ * (Λ x -> Maybe x) , which corresponds to forall a. Maybe a forall a. Maybe a , which does not ignore the type argument, but only "passes it". Thus, a ∏ A , which has an irrelevant region A (for example, when A = * ), can be considered as a larger number of indexed intersections A (taking into account common elements in all instances of the index) and not a product.
Most importantly, at the value level, parametricity rules prevent any ridiculous behavior that might suggest that the types are larger than they really are. Since we do not have a typecase, we cannot construct a value of type forall a. B forall a. B , which does something different depending on what instance of A was created. Thus, although the type is technically a function * -> B , it is always a constant function and thus equivalent to a single value of B Using the interpretation of ∏ , it is really equivalent to the infinite * -arrier product B s, but those values of B must always be the same, so the infinite product effectively exceeds the unit B
Similarly, although ∏ * (Λ x -> (x -> x)) (aka, forall a. a -> a ) is technically equivalent to an infinite product of functions, none of these functions can check the type, therefore all are limited only to returning them input value, not a funny business, for example (+1) : Int -> Int when creating an instance of Int . Since there is only one function (subject to the full language) that cannot check the type of its argument, but must return a value of the same type, an infinite product is thus equal to a single value.
Now about your direct question about (forall r . (a -> r) -> r) ~ a . First, to express the ~ operator more formally. This is really an isomorphism, so we need two functions going back and forth, and the argument that they are inverted.
data Iso ab = Iso { to :: a -> b , from :: b -> a -- proof1 :: forall x. to (from x) == x -- proof2 :: forall x. from (to x) == x }
and now we express your initial question in more formal terms. Your question boils down to constructing the following term (unproductive, so the GHC has problems with it, but we will survive):
forall a. Iso (forall r. (a -> r) -> r) a
Which, using my earlier terminology, is ∏ * (Λ a -> Iso (∏ * (Λ r -> ((a -> r) -> r))) a) . Once again, we have an endless product that cannot verify its type argument. Through manual work, we can argue that only possible values that take into account the parametricity rules (two other proofs are automatically observed) for to and from are ($ id) and flip id .
If this seems unsatisfactory, it is probably because the algebraic interpretation of forall really added nothing to the proof. This is a really simple theory of the old type, but I hope that I can provide something that seems a little less categorical than the Yoneda form. It is worth noting that in fact we do not need to use parametricity to write proof1 and proof2 . Parametricity only enters the image when we want to state that ($ id) and flip id are our only options for to and from (which we cannot prove in Agda or Coq for this reason).