I have several large datasets with ~ 10 columns and ~ 200000 rows. Not all columns contain values ββfor each row, although at least one column must contain a value for the row that should be present, I would like to set a threshold value for the number of NA in the row.
My Dataframe looks something like this:
ID qrstuvwxyz A 1 5 NA 3 8 9 NA 8 6 4 B 5 NA 4 6 1 9 7 4 9 3 C NA 9 4 NA 4 8 4 NA 5 NA D 2 2 6 8 4 NA 3 7 1 32
And I would like to be able to delete rows containing more than two cells containing NA to get
ID qrstuvwxyz A 1 5 NA 3 8 9 NA 8 6 4 B 5 NA 4 6 1 9 7 4 9 3 D 2 2 6 8 4 NA 3 7 1 32
complete.cases deletes all rows containing any NA , and I know that it is possible to delete rows containing NA in certain columns, but there is a way to change it so that it is non-specific, which columns contains NA , but how much of the total?
Alternatively, this framework is generated by merging multiple data frames using
file1<-read.delim("~/file1.txt") file2<-read.delim(file=args[1]) file1<-merge(file1,file2,by="chr.pos",all=TRUE)
Perhaps the merge function could be changed?
thanks
merge filter r rows na
user2662708
source share