Curve fixing for specific data

I have the following data in my dissertation:

28 45 91 14 102 11 393 5 4492 1.77 

I need to adjust the curve. If I speak, then this is what I get.

enter image description here

I think some kind of exponential curve should fit this data. I am using GNUplot. Can someone tell me which curve will fit this and what initial parameters can I use?

+4
source share
2 answers

Just in case, R is an option, here is a description of two methods that you can use.

First method: assess the suitability of a set of candidate models . This is perhaps the best way, because it uses what you already know or expect from the relationship between variables.

 # read in the data dat <- read.table(text= "xy 28 45 91 14 102 11 393 5 4492 1.77", header = TRUE) # quick visual inspection plot(dat); lines(dat) 

enter image description here

  # a smattering of possible models... just made up on the spot # with more effort some better candidates should be added # a smattering of possible models... models <- list(lm(y~x, data = dat), lm(y~I(1/x), data=dat), lm(y ~ log(x), data = dat), nls(y ~ I(1/x*a) + b*x, data = dat, start = list(a = 1, b = 1)), nls(y ~ (a + b*log(x)), data=dat, start = setNames(coef(lm(y ~ log(x), data=dat)), c("a", "b"))), nls(y ~ I(exp(1)^(a + b * x)), data=dat, start = list(a=0,b=0)), nls(y ~ I(1/x*a)+b, data=dat, start = list(a=1,b=1)) ) # have a quick look at the visual fit of these models library(ggplot2) ggplot(dat, aes(x, y)) + geom_point(size = 5) + stat_smooth(method = "lm", formula = as.formula(models[[1]]), size = 1, se = FALSE, colour = "black") + stat_smooth(method = "lm", formula = as.formula(models[[2]]), size = 1, se = FALSE, colour = "blue") + stat_smooth(method = "lm", formula = as.formula(models[[3]]), size = 1, se = FALSE, colour = "yellow") + stat_smooth(method = "nls", formula = as.formula(models[[4]]), data=dat, start = list(a=0,b=0), size = 1, se = FALSE, colour = "red") + stat_smooth(method = "nls", formula = as.formula(models[[5]]), data=dat, start = setNames(coef(lm(y ~ log(x), data=dat)), c("a", "b")), size = 1, se = FALSE, colour = "green") + stat_smooth(method = "nls", formula = as.formula(models[[6]]), data=dat, start = list(a=0,b=0), size = 1, se = FALSE, colour = "violet") + stat_smooth(method = "nls", formula = as.formula(models[[7]]), data=dat, start = list(a=0,b=0), size = 1, se = FALSE, colour = "orange") 

enter image description here

The orange curve looks pretty good. Let's see how it is evaluated when we measure the relative quality factor of fitting these models ...

 # calculate the AIC and AICc (for small samples) for each # model to see which one is best, ie has the lowest AIC library(AICcmodavg); library(plyr); library(stringr) ldply(models, function(mod){ data.frame(AICc = AICc(mod), AIC = AIC(mod), model = deparse(formula(mod))) }) AICc AIC model 1 70.23024 46.23024 y ~ x 2 44.37075 20.37075 y ~ I(1/x) 3 67.00075 43.00075 y ~ log(x) 4 43.82083 19.82083 y ~ I(1/x * a) + b * x 5 67.00075 43.00075 y ~ (a + b * log(x)) 6 52.75748 28.75748 y ~ I(exp(1)^(a + b * x)) 7 44.37075 20.37075 y ~ I(1/x * a) + b # y ~ I(1/x * a) + b * x is the best model of those tried here for this curve # it fits nicely on the plot and has the best goodness of fit statistic # no doubt with a better understanding of nls and the data a better fitting # function could be found. Perhaps the optimisation method here might be # useful also: http://stats.stackexchange.com/a/21098/7744 

Second method: use genetic programming to search for a huge number of models. . It seems a kind of wild shot in a dark approach to the curve. You do not need to specify a lot at the beginning, although perhaps I am doing it wrong ...

 # symbolic regression using Genetic Programming # http://rsymbolic.org/projects/rgp/wiki/Symbolic_Regression library(rgp) # this will probably take some time and throw # a lot of warnings... result1 <- symbolicRegression(y ~ x, data=dat, functionSet=mathFunctionSet, stopCondition=makeStepsStopCondition(2000)) # inspect results, they'll be different every time... (symbreg <- result1$population[[which.min(sapply(result1$population, result1$fitnessFunction))]]) function (x) tan((x - x + tan(x)) * x) # quite bizarre... # inspect visual fit ggplot() + geom_point(data=dat, aes(x,y), size = 3) + geom_line(data=data.frame(symbx=dat$x, symby=sapply(dat$x, symbreg)), aes(symbx, symby), colour = "red") 

enter image description here

Actually a very poor visual approach. It may take a little more effort to get quality genetic programming results ...

Credits: fooobar.com/questions/1456297 / ... , fooobar.com/questions/1456299 / ...

+30
source

Do you know any analytic function that data should adhere to? If so, it can help you choose the form of the function that matches the data.

Otherwise, since the data looks like an exponential decay, try something like this in gnuplot, where a function with two free parameters is bound to the data:

  f(x) = exp(-x*c)*b fit f(x) "data.dat" u 1:2 via b,c plot "data.dat" wp, f(x) 

Gnuplot will modify the parameters named after the 'via' clause for best fit. Statistics are printed on stdout, as well as a file called "fit.log" in the current working directory.

The variable c will determine the curvature (decay), and the variable b will scale all the values ​​linearly to get the correct amount of data.

See the Curve section in the Gnuplot documentation for more information.

+5
source

All Articles