Ggplot: how to limit the output in a bar chart, so only the most frequent occurrences are displayed?

I searched for this simple thing for hours, but to no avail. I have a dataframe with one of the columns of the country variable. I want two things the following:

  • Participate in the most frequent countries that are most often found at the top (a partial solution is found EDIT the full solution found → focusing on the maximum output in a bar chart based on frequency);
  • Show only the "most frequent" countries, moving the rest to the "Other" variable.

I tried ggplot table() or summary() , but that did not work. This is possible even in ggplot, or I have to use barchart (I managed to do this with barchart, just using summary(df$something) and adding max = x ). I also wanted to make a conclusion (different questions about the country).

Most frequent countries:

 ggplot(aDDs,aes(x= factor(answer, levels=names(sort(table(answer),increasing=TRUE)) ),fill=question ) ) + geom_bar() + coord_flip() 

Suggestions are very welcome.

====== EDIT3: I continued to work on code based on @CMichael's suggestion, but now came across another, rather strange. Since this "ifelse" issue addresses a bit of one question other than my original, I posted a separate question on this. Please check it here: R: ifelse function returns vector position instead of value (string)

====== EDIT:

An example of aDDs is reproduced below - a dataset of aDDs can be downloaded here :

 temp <- structure(list(student = c(2270285L, 2321254L, 75338L, 2071594L,1682771L, 1770356L, 2155693L, 3154864L, 3136979L, 2082311L),answer = structure(c(181L, 87L, 183L, 89L, 115L, 183L, 172L,180L, 175L, 125L), .Label = c("Congo", "Guinea-Bissau", "Solomon Islands","Central African Rep", "Comoros", "Equatorial Guinea", "Liechtenstein","Nauru", "Brunei", "Djibouti", "Kiribati", "Papua New Guinea","Samoa", "South Sudan", "Tajikistan", "Tonga", "Bhutan","Gabon", "Laos", "Lesotho", "Maldives", "Micronesia", "St Kitts and Nevis","Mozambique", "Niger", "Andorra", "Cape Verde", "Mauritania","Antigua and Deps", "Chad", "Guinea", "Malta", "Burundi","Eritrea", "Iceland", "Kyrgyzstan", "Turkmenistan", "Azerbaijan","Dominica", "Belize", "Malawi", "Mali", "Moldova", "Benin","Cuba", "Gambia", "Luxembourg", "St Lucia", "Angola", "Cambodia","Georgia", "Madagascar", "Oman", "Kosovo", "Kuwait", "Namibia","Bahrain", "Congo - Democratic Rep", "Montenegro", "Senegal","Sierra Leone", "Togo", "Botswana", "Fiji", "Libya", "Uzbekistan","Guyana", "Mongolia", "Somalia", "Zambia", "Estonia", "Ivory Coast","Myanmar", "Grenada", "Qatar", "Saint Vincent and the Grenadines","Tanzania", "Armenia", "Bahamas", "Belarus", "Burkina", "Liberia","Afghanistan", "Latvia", "Yemen", "Mauritius", "Albania","Barbados", "Iraq", "Macedonia", "Nicaragua", "Panama", "Slovenia","Lebanon", "Slovakia", "Kazakhstan", "Paraguay", "Korea South","Suriname", "Czech Republic", "Rwanda", "Haiti", "Lithuania","Israel", "Zimbabwe", "Cyprus", "Honduras", "Uruguay", "Syria","Finland", "Tunisia", "Taiwan", "Uganda", "Denmark", "Austria","Sri Lanka", "Vietnam", "Bosnia Herzegovina", "Thailand","Norway", "Trinidad and Tobago", "Switzerland", "Nepal","Sudan", "Jamaica", "Japan", "United Arab Emirates", "Bolivia","New Zealand", "Ethiopia", "Jordan", "Cameroon", "Croatia","Sweden", "Kenya", "Singapore", "Guatemala", "Ireland Republic","Saudi Arabia", "Bulgaria", "Malaysia", "Belgium", "Dominican Republic","Algeria", "El Salvador", "Bangladesh", "Serbia", "Ghana","Costa Rica", "Indonesia", "Hungary", "Venezuela", "Ecuador","Ukraine", "Romania", "Turkey", "China", "Morocco", "Russian Federation","Peru", "South Africa", "Argentina", "Portugal", "Iran","Poland", "Italy", "Chile", "France", "Germany", "Australia","Philippines", "Egypt", "Greece", "Nigeria", "Canada", "Pakistan","United Kingdom", "Mexico", "Colombia", "Brazil", "Netherlands","Spain", "India", "United States"), class = "factor"), question = c("C1-pres","C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres", "C1-pres","B1-pres", "B1-pres", "B1-pres")), .Names = c("student","answer", "question"), row.names = c("156", "203", "280", "347","412", "478", "534", "1649651", "1649691", "1649763"), class = "data.frame") 
+3
r ggplot2
source share
2 answers

For the filtering question, you must enter a new column:

 data$filteredCountry = ifelse(data$value > threshold, data$country, "other") 

Now you can use filterCountry as your x in aesthetics.

The issue of organizing data appears from time to time (for example, ggplot2: plot sorting ). You must indicate the levels of your country factor by baseline. Your reordering team seems to sort again by country name, I would expect something like a reordering (country, frequency), but sample data might help.

UPDATE: With the data provided, it becomes obvious that you need to create a composite data set:

 data <- read.table("aDDs.csv",sep=",",header=T) require(plyr) summary <- ddply(data,.(answer),summarise,freq=length(answer)) 

This gives a summary of data frames with one record for each country (181 in total). You can now filter and reorder:

 threshold = quantile(summary$freq,0.9) summary $filteredCountry = ifelse(summary$freq > threshold, summary$answer, "other") summary$filteredCountry = reorder(summary$filteredCountry,-summary$freq) 

Now you can build:

 require(ggplot2) p=ggplot(data=summary,aes(x=filteredCountry,y=freq)) p = p+geom_bar(aes(fill=filteredCountry),stat="identity") p 
+3
source share

Thanks to @CMichael's suggestions and answers to other questions related to SO. I managed to create a strict and ordered bar chart using ggplot:

create a list with the most common country names

 temp <- row.names(as.data.frame(summary(aDDs$answer, max=12))) # create a df or something else with the summary output. aDDs$answer <- as.character(aDDs$answer) # IMPORTANT! Here was the problem: turn into character values 

create a new column that filters the best results

 aDDs$top <- ifelse( aDDs$answer %in% temp, ## condition: match aDDs$answer with row.names in summary df aDDs$answer, ## then it should be named as aDDs$answer "Other" ## else it should be named "Other" ) aDDs$top <- as.factor(aDDs$top) # factorize the output again 

plot

 ggplot(aDDs,aes(x= factor(top, levels=names(sort(table(top),increasing=TRUE)) ),fill=question ) ) + geom_bar() + coord_flip() 

And here is the output (still needs some tweaking, but this is what I wanted):

demo-solar

+1
source share

All Articles