How to put a smooth curve in my data in R?

Question

How to put a smooth curve in my data in R?

I am trying to make a smooth curve in R I have the following simple data about a toy:

 > x [1] 1 2 3 4 5 6 7 8 9 10 > y [1] 2 4 6 8 7 12 14 16 18 20

Now, when I draw it with the standard command, it looks awkward and sharp, of course:

 > plot(x,y, type='l', lwd=2, col='red')

How to make a curve smooth so that 3 edges are rounded using estimated values? I know that there are many methods to fit a smooth curve, but I'm not sure which one is most suitable for this type of curve and how you write it in R

+76

r plot curve-fitting

Frank Aug 13 '10 at 20:18

source share

8 answers

Perhaps smooth.spline is an option, here you can set the smoothing parameter (usually between 0 and 1)

 smoothingSpline = smooth.spline(x, y, spar=0.35) plot(x,y) lines(smoothingSpline)

You can also use prediction for smooth.spline objects. Function comes with R base, see? smooth.spline for details.

+57

Karsten W. Aug. 14 '10 at 0:51

source share

To get this REALLY smoooth ...

 x <- 1:10 y <- c(2,4,6,8,7,8,14,16,18,20) lo <- loess(y~x) plot(x,y) xl <- seq(min(x),max(x), (max(x) - min(x))/1000) lines(xl, predict(lo,xl), col='red', lwd=2)

This style interpolates a lot of extra points and gives you a curve that is very smooth. It also looks like the approach ggplot does. If the standard level of smoothness is fine, you can simply use.

 scatter.smooth(x, y)

+26

John Aug 14 '10 at 2:15

source share

The qplot () function in the ggplot2 package is very easy to use and is an elegant solution that includes confidence ranges. For example,

 qplot(x,y, geom='smooth', span =0.5)

produces enter image description here

+24

Underminer Apr 16 '15 at 2:04

source share

LOESS is a very good approach, as Dirk said.

Another option is to use Bezier splines, which in some cases may work better than LOESS if you have few data points.

Here you will find an example: http://rosettacode.org/wiki/Cubic_bezier_curves#R

 # x, y: the x and y coordinates of the hull points # n: the number of points in the curve. bezierCurve <- function(x, y, n=10) { outx <- NULL outy <- NULL i <- 1 for (t in seq(0, 1, length.out=n)) { b <- bez(x, y, t) outx[i] <- b$x outy[i] <- b$y i <- i+1 } return (list(x=outx, y=outy)) } bez <- function(x, y, t) { outx <- 0 outy <- 0 n <- length(x)-1 for (i in 0:n) { outx <- outx + choose(n, i)*((1-t)^(ni))*t^i*x[i+1] outy <- outy + choose(n, i)*((1-t)^(ni))*t^i*y[i+1] } return (list(x=outx, y=outy)) } # Example usage x <- c(4,6,4,5,6,7) y <- 1:6 plot(x, y, "o", pch=20) points(bezierCurve(x,y,20), type="l", col="red")

+12

nico Aug 13 '10 at 21:58

source share

Other answers are all good approaches. However, there are several other options in R that have not been mentioned, including lowess and approx , which can give better features or better performance.

The benefits are easier to demonstrate with an alternative dataset:

 sigmoid <- function(x) { y<-1/(1+exp(-.15*(x-100))) return(y) } dat<-data.frame(x=rnorm(5000)*30+100) dat$y<-as.numeric(as.logical(round(sigmoid(dat$x)+rnorm(5000)*.3,0)))

Here is the data superimposed on the sigmoid curve that generated it:

Similar data are common when considering binary behavior among the population. For example, this could be a graph of whether a customer is buying something (binary 1/0 on the Y axis) and the amount of time they spent on the site (x axis).

A large number of points are used to better demonstrate the differences in the performance of these functions.

Smooth , spline and smooth.spline all gibberish in a data set like this with any set of parameters I tried, possibly because of their tendency to map to each point, which doesn't work for noisy data.

The functions loess , lowess and approx all give useful results, although hardly for approx . This is the code for each using slightly optimized parameters:

 loessFit <- loess(y~x, dat, span = 0.6) loessFit <- data.frame(x=loessFit$x,y=loessFit$fitted) loessFit <- loessFit[order(loessFit$x),] approxFit <- approx(dat,n = 15) lowessFit <-data.frame(lowess(dat,f = .6,iter=1))

And the results:

 plot(dat,col='gray') curve(sigmoid,0,200,add=TRUE,col='blue',) lines(lowessFit,col='red') lines(loessFit,col='green') lines(approxFit,col='purple') legend(150,.6, legend=c("Sigmoid","Loess","Lowess",'Approx'), lty=c(1,1), lwd=c(2.5,2.5),col=c("blue","green","red","purple"))

As you can see, lowess creates an almost perfect fit to the original generation curve. loess is close, but experiences a strange deviation on both tails.

Although your dataset will be completely different, I found that other datasets work similarly, with loess and lowess able to produce good results. The differences become more significant when you look at the tests:

 > microbenchmark::microbenchmark(loess(y~x, dat, span = 0.6),approx(dat,n = 20),lowess(dat,f = .6,iter=1),times=20) Unit: milliseconds expr min lq mean median uq max neval cld loess(y ~ x, dat, span = 0.6) 153.034810 154.450750 156.794257 156.004357 159.23183 163.117746 20 c approx(dat, n = 20) 1.297685 1.346773 1.689133 1.441823 1.86018 4.281735 20 a lowess(dat, f = 0.6, iter = 1) 9.637583 10.085613 11.270911 11.350722 12.33046 12.495343 20 b

loess extremely slow, taking 100x until approx . lowess gives better results than approx , but still works pretty fast (15 times faster than loess).

loess also becoming more and more bogged down as the number of points increases, becoming unusable around 50,000.

EDIT: More research shows that loess provides better tricks for some datasets. If you are dealing with a small set of data or performance, this is not a consideration, try both functions and compare the results.

+9

Craig Feb 17 '17 at 16:51

source share

In ggplot2 you can do anti-aliasing in several ways, for example:

 library(ggplot2) ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_smooth(method = "gam", formula = y ~ poly(x, 2)) ggplot(mtcars, aes(wt, mpg)) + geom_point() + geom_smooth(method = "loess", span = 0.3, se = FALSE)

+3

jsb Jan 10 '18 at 1:41

source share

I did not see this method shown, therefore, if someone else wants to do this, I found that the ggplot documentation suggested a method for using the gam method, which gives similar results with loess when working with small data sets.

 library(ggplot2) x <- 1:10 y <- c(2,4,6,8,7,8,14,16,18,20) df <- data.frame(x,y) r <- ggplot(df, aes(x = x, y = y)) + geom_smooth(method = "gam", formula = y ~ s(x, bs = "cs"))+geom_point() r

Firstly, using the Loess method and auto-formula. Secondly, using the gam method with the proposed formula

0

Adam Bunn Apr 02 '19 at 21:15

source share

Dirk Eddelbuettel · Accepted Answer · 2010-08-13 20:28

I like loess() lot for smoothing:

 x <- 1:10 y <- c(2,4,6,8,7,12,14,16,18,20) lo <- loess(y~x) plot(x,y) lines(predict(lo), col='red', lwd=2)

Venables and Ripley MASS have a whole section on anti-aliasing, which also covers splines and polynomials, but loess() is almost everyone's favorite.

How to put a smooth curve in my data in R?

More articles: