Curve fix: find the smoothest function that satisfies the list of constraints

Question

Curve fix: find the smoothest function that satisfies the list of constraints

Consider the set of non-decreasing surjective (on) functions from (-inf, inf) to [0,1]. (Typical CDFs satisfy this property.) In other words, for any real number x 0 <= f (x) <= 1. The logistic function is perhaps the most famous example.

Now we are given some restrictions in the form of a list of x values, and for each x value - a pair of y values, between which the function should lie. We can represent this as a list of {x, ymin, ymax} triples, such as

constraints = {{0, 0, 0}, {1, 0.00311936, 0.00416369}, {2, 0.0847077, 0.109064}, {3, 0.272142, 0.354692}, {4, 0.53198, 0.646113}, {5, 0.623413, 0.743102}, {6, 0.744714, 0.905966}}

Graphically, it looks like this:

_{(source: yootles.com )}

Now we are looking for a curve that takes these constraints into account. For example:

_{(source: yootles.com )}

Let's first try simple interpolation through the middle of the constraints:

 mids = ({#1, Mean[{#2,#3}]}&) @@@ constraints f = Interpolation[mids, InterpolationOrder->0]

On the graph, f looks like this:

_{(source: yootles.com )}

This function is not surjective. In addition, we would like it to be smooth. We can increase the order of interpolation, but now it violates the restriction that its range is [0,1]:

_{(source: yootles.com )}

Thus, the goal is to find the smoothest function that satisfies the constraints:

Non-decreasing.
Tends to 0 when x approaches negative infinity, and tends to 1 when x approaches infinity.
Passes through this list of y-error-bars.

The first example I cited above seems like a good candidate, but I did it using the Mathematica FindFit function , which assumes a logarithmic CDF . This works well in this particular example, but in the general case, there should not necessarily be a lognormal CDF that satisfies the constraints.

+7

math wolfram-mathematica plot curve-fitting cdf

dreeves Apr 23 '10 at 22:52

source share

3 answers

unutbu · Answer 1 · 2010-04-24T02:12:52+0000

I do not think that you have specified enough criteria to make the desired CDF unique.

If the only criteria that must be met are:

CDF should be "pretty smooth" (see below)
CDF must be non-decreasing
CDF must go through the "error range" of y-intervals
CDF should go to 0 as x → -Infinity
CDF should tend to 1 as x → infinity.

then maybe you could use Monotone Cubic Interpolation . This will give you a C ^ 2 (twice continuously differentiable) function, which, unlike cubic splines, guarantees monotonicity for given monotonic data.

This leaves open the question of exactly what data should be used to create monotonic cubic interpolation. If you take the center point (average) of each error bar, do you guarantee that the resulting data points are monotonous increase? If not, you can make an arbitrary choice to ensure that the points you select are monotonically increasing (because the criteria do not force our decision to be unique).

Now what to do with the last data point? Is there an X that is guaranteed to be larger than any x in the constraint dataset? Perhaps you can again make an arbitrary choice of convenience and choose a very large X and put (X, 1) as the data endpoint.

Comment 1:. Your problem can be broken down into 2 sub-problems:

Given the exact points (x_i, y_i) that the CDF should go through, how do you create a CDF? I suspect there are infinitely many possible solutions, even with a limit of infinite smoothness.
Given y-errorbars, how should you choose (x_i, y_i)? Again, there are infinitely many possible solutions. You may need to add some additional criteria to make a unique choice. Additional criteria are also likely to make the problem even more difficult than they currently are.

Comment 2:. Here you can use monotonic cubic interpolation and satisfy criteria 4 and 5:

Monotonic cubic interpolation (let's call it f ) maps R → R.

Let CDF(x) = exp(-exp(f(x))) . Then CDF: R --> (0,1) . If we could find a suitable f , then by defining CDF in this way, we could satisfy criteria 4 and 5.

To find f , convert the constraints CDF (x_0,y_0),...,(x_n,y_n) using the conversion xhat_i = x_i , yhat_i = log(-log(y_i)) . This is the inverse of the CDF . If y_i increased, then yhat_i decrease.

Now apply monotonic cubic interpolation to the data points (x_hat, y_hat) to generate f . Then, finally, we define CDF(x) = exp(-exp(f(x))) . This will be a monotonically increasing function of R → (0,1), which passes through the points (x_i, y_i).

This, I think, satisfies all the criteria 2-5. Criteria 1 are somewhat satisfied, although, of course, smoother solutions may exist.

dreeves · Answer 2 · 2010-04-24T21:24:56+0000

I found a solution that gives reasonable results for various inputs. I will start by fitting the model - once to the lower limits of the constraints, and again to the upper limits. I will call the average of these two built-in functions an “ideal function”. I use this ideal function to extrapolate to the left and right of where the constraints end, and also to interpolate between any gaps in the constraints. I calculate values for an ideal function at regular intervals, including all restrictions, starting from the point where the function is almost zero on the left and ending with almost one on the right. With restrictions, I trim these values as needed to satisfy the restrictions. Finally, I create an interpolation function that goes through these values.

My implementation of Mathematica is as follows.
Firstly, a couple of helper functions:

 (* Distance from x to the nearest member of list l. *) listdist[x_, l_List] := Min[Abs[x - #] & /@ l] (* Return a value x for the variable var such that expr/.var->x is at least (or at most, if dir is -1) t. *) invertish[expr_, var_, t_, dir_:1] := Module[{x = dir}, While[dir*(expr /. var -> x) < dir*t, x *= 2]; x]

And here is the main function:

 (* Return a non-decreasing interpolating function that maps from the reals to [0,1] and that is as close as possible to expr[var] without violating the given constraints (a list of {x,ymin,ymax} triples). The model, expr, will have free parameters, params, so first do a model fit to choose the parameters to satisfy the constraints as well as possible. *) cfit[constraints_, expr_, params_, var_] := Block[{xlist,bots,tops,loparams,hiparams,lofit,hifit,xmin,xmax,gap,aug,bests}, xlist = First /@ constraints; bots = Most /@ constraints; (* bottom points of the constraints *) tops = constraints /. {x_, _, ymax_} -> {x, ymax}; (* fit a model to the lower bounds of the constraints, and to the upper bounds *) loparams = FindFit[bots, expr, params, var]; hiparams = FindFit[tops, expr, params, var]; lofit[z_] = (expr /. loparams /. var -> z); hifit[z_] = (expr /. hiparams /. var -> z); (* find x-values where the fitted function is very close to 0 and to 1 *) {xmin, xmax} = { Min@Append[xlist, invertish[expr /. hiparams, var, 10^-6, -1]], Max@Append[xlist, invertish[expr /. loparams, var, 1-10^-6]]}; (* the smallest gap between x-values in constraints *) gap = Min[(#2 - #1 &) @@@ Partition[Sort[xlist], 2, 1]]; (* augment the constraints to fill in any gaps and extrapolate so there are constraints everywhere from where the function is almost 0 to where it almost 1 *) aug = SortBy[Join[constraints, Select[Table[{x, lofit[x], hifit[x]}, {x, xmin,xmax, gap}], listdist[#[[1]],xlist]>gap&]], First]; (* pick a y-value from each constraint that is as close as possible to the mean of lofit and hifit *) bests = ({#1, Clip[(lofit[#1] + hifit[#1])/2, {#2, #3}]} &) @@@ aug; Interpolation[bests, InterpolationOrder -> 3]]

For example, we can correspond to a lognormal, normal, or logistic function:

 g1 = cfit[constraints, CDF[LogNormalDistribution[mu,sigma], z], {mu,sigma}, z] g2 = cfit[constraints, CDF[NormalDistribution[mu,sigma], z], {mu,sigma}, z] g3 = cfit[constraints, 1/(1 + c*Exp[-k*z]), {c,k}, z]

Here's what they look like for my initial list of restrictions:

_{(source: yootles.com )}

Normal and logistics are almost on top of each other, and the logarithm is a blue curve.

This is not quite perfect. In particular, they are not completely monotonous. Here is a plot about derivatives:

 Plot[{g1'[x], g2'[x], g3'[x]}, {x, 0, 10}]

_{(source: yootles.com )}

This shows some lack of smoothness, as well as a slight nonmonotonicity near zero. I welcome the improvements to this solution!

i_am_jorf · Answer 3 · 2010-04-24T02:04:17+0000

You can try to match the Bezier curve through the middle. In particular, I think you want a C2 continuous curve.

Curve fix: find the smoothest function that satisfies the list of constraints

More articles: