Building content from multiple data frames on one ggplot2 surface

Question

Building content from multiple data frames on one ggplot2 surface

I am here a complete R newbie with the appropriate difficulty level for this question.

I use the ROCR package in R to generate graphic data for ROC curves. Then I use ggplot2 to draw a graph. Something like that:

library(ggplot2) library(ROCR) inputFile <- read.csv("path/to/file", header=FALSE, sep=" ", colClasses=c('numeric','numeric'), col.names=c('score','label')) predictions <- prediction(inputFile$score, inputFile$label) auc <- performance(predictions, measure="auc")@y.values[[1]] rocData <- performance(predictions, "tpr","fpr") rocDataFrame <- data.frame( x=rocData@x.values [[1]], y=rocData@y.values [[1]]) rocr.plot <- ggplot(data=rd, aes(x=x, y=y)) + geom_path(size=1) rocr.plot <- rocr.plot + geom_text(aes(x=1, y= 0, hjust=1, vjust=0, label=paste(sep = "", "AUC = ",round(auc,4))),colour="black",size=4)

This works well for drawing a single ROC curve. However, what I would like to do is read in whole directories of input files - one file for each result of the classifier test - and makes a multi-faceted graph ggplot2 of all ROC curves, while at the same time printing the AUC score on each graph.

I would like to understand what the “right” R-style approach is to achieve this. I am sure that I can hack something together by passing one cycle through all the files in the directory and creating a separate data frame for each, and then creating another cycle to create several graphs and somehow get ggplo2 to display all these graphs on the same surface. However, this does not allow me to use the built-in ggplot2 cut, which, in my opinion, is the right approach. I am not sure how to get my data in proper form for using the cut. Should I merge all my data frames into one and give each merged fragment a name (for example, a file name) and a cut? If so, is there a library or recommended practice for this?

Your suggestions are welcome. I'm still pondering the best practices in R, so I would rather get expert advice rather than just hacking things to make code that looks more like the usual declarative programming languages I'm used to.

EDIT: I’m the least understood if using the built-in ggplot2 cut features I can output a custom line (AUC score) to each plot that it will generate.

+4

r ggplot2

Inverseofverse Aug 08 '12 at 7:00

source share

1 answer

Andrie · Accepted Answer · 2012-08-08T07:46:05+0000

Here is an example of how to create a plot as you described. I use the built-in quakes :

The code performs the following actions:

Download ggplot2 and plyr
Add face variable to quakes - in this case, I summarize the depth of the earthquake
Use ddply to sum the average for each depth
Use ggplot with geom_text to indicate average

The code:

 library(plyr) library(ggplot2) quakes$level <- cut(quakes$depth, 5, labels=c("Very Shallow", "Shallow", "Medium", "Deep", "Very Deep")) quakes.summary <- ddply(quakes, .(level), summarise, mag=round(mean(mag), 1)) ggplot(quakes, aes(x=long, y=lat)) + geom_point(aes(colour=mag)) + geom_text(aes(label=mag), data=quakes.summary, x=185, y=-35) + facet_grid(~level) + coord_map()

Building content from multiple data frames on one ggplot2 surface

More articles: