How to make a Pareto chart (aka rank-order chart) with ggplot2

Question

How to make a Pareto chart (aka rank-order chart) with ggplot2

I found a ranking chart (also known as a Pareto chart) in the book Data Analysis with Open Source Tools. So I tried to build an example in a book using ggplot2.

The following figure is given in the book: note that the coordinates are inverted so that the names of countries are displayed along the Y axis, which is more readable. The ruler is the CDF (cumulative distribution function) of the data.

Rank order chart (Source: data analysis with open source tools)

To make partial simulated data:

country = c('US', 'Brazil', 'Japan', 'India', 'Germany', 'UK', 'Russia', 'France') sales = c(40, 14, 7, 6, 2.8, 2, 1.8, 1) # The data is already sorted df = data.table(country=country, sales=sales)

Then I used stat_ecdf in ggplot2 to build the CDF:

 ggplot(data=df) + stat_ecdf(aes(x=sales))

But the figure looked like this:

If the X axis represents sales, but not the country.

I found another implementation here . But it is implemented by a linear diagram along with an explicit total, which is very different from the example in the book.

Is there an approach to constructing a Pareto diagram as the first digit?

EDIT

I made a mistake in the content of the bar line. This is not a CDF, but a cumulative proportion.

In the CDF, which displays the value in its percentile rank, the US percentile ranking is 100. But in the ranking chart, the percentage of US is about 45%, which indicates that sales in the US account for 45% of total sales.

Accordingly, I should not use stat_ecdf to build a ranking chart.

+5

r ggplot2

Zelong Jun 12 '15 at 23:56

source share

1 answer

josliber · Accepted Answer · 2015-06-13T00:34:40+0000

There is a good discussion about why building with two different y-axes is a bad idea. I will limit the sales schedule and cumulative percentage separately and display them next to each other to give a complete visual representation of the Pareto chart.

 # Sales df <- data.frame(country, sales) df <- df[order(df$sales, decreasing=TRUE),] df$country <- factor(df$country, levels=as.character(df$country)) # Order countries by sales, not alphabetically library(ggplot2) ggplot(df, aes(x=country, y=sales, group=1)) + geom_path()

 # Cumulative percentage df.pct <- df df.pct$pct <- 100*cumsum(df$sales)/sum(df$sales) ggplot(df.pct, aes(x=country, y=pct, group=1)) + geom_path() + ylim(0, 100)

How to make a Pareto chart (aka rank-order chart) with ggplot2

EDIT

More articles: