I also thought about testing in functional code. I do not have all the answers, but I will write a little here.
Functional programs are combined in different ways and require different approaches to testing.
If you take even the most superficial look at Haskell testing, you will inevitably come across QuickCheck and SmallCheck, two very famous Haskell testing libraries. They both perform property-based testing.
In OO, you must carefully write individual tests to set up half a dozen mock objects, call a method or two, and make sure the expected external methods are called with the correct data and / or the method ultimately returns the right answer. This is quite a bit of work. You are probably doing this with only one or two test cases.
QuickCheck is something else. You can write a property that says something like "if I sort this list, the output should have the same number of elements as the input." This is single line. Then, the QuickCheck library will automatically create hundreds of randomly generated lists and verify that the specified condition is met for each of them. And if this does not happen, he will spit out the exact input on which the test failed.
(Both QuickCheck and SmallCheck do roughly the same thing. QuickCheck generates random tests, while SmallCheck systematically tries to use all combinations to a certain size.)
You say you're worried about a combinatorial explosion of possible flow control paths to check, but with tools like this, generating test cases dynamically for you, manually writing enough tests is not a problem. The only problem is that there is enough data to check all the flow paths.
Haskell can help too. I read an article about the library [I don't know if it has ever been released] that actually uses the lazy Haskell score to determine what code is testing with input. As in the case, it can determine whether the function under test checks the contents of the list or only the size of this list. It can determine which fields of this customer record relate to. And so on. Thus, it automatically generates data, but does not spend hours producing various random variations of data parts that are not even related to this particular code. (For example, if you sort clients by ID, it does not matter what is in the "Name" field.)
As for the test functions that accept or produce functions ... yes, I have no answer to this question.