Find the number of spaces in a line

How to create different columns based on space, for example: "I go out"

ANs 3 Column1 Column2 Column3 Column14 I am going out 
+6
source share
4 answers

If you need the actual column values, as your example shows, you can read the table from a text join:

 > read.table(textConnection("I am going Out")) V1 V2 V3 V4 1 I am going Out 

To answer the title of your question, i.e. how many spaces are there, you can use ncol to count the columns above and subtract it. However, if you are only interested in the number of spaces, more efficient:

 length(gregexpr(" ", "I am going Out")[[1]]) 

Use regex to find spaces.

[[1]] takes the first element of the result list, which corresponds to the first element of the input vector with "I exit" as its only element. If you passed another vector, your list may have more than one element or may not contain an empty vector at all.

If there is no space, gregexpr will still return a list of length 1 , and -1 as the match position indicate that there was no match. This leads to the fact that the above code incorrectly reports one result in this case. A more complex solution that addresses this, and also takes vectors as input, is as follows:

 countSpaces <- function(s) { sapply(gregexpr(" ", s), function(p) { sum(p>=0) } ) } 

The function works as follows: gregexpr will return a list of results, one for each element of the input vector s . sapply will sapply over this list, and for each list item calculate the number of matches. Instead of counting the length vector of matching positions, it uses sum only to count non-negative values, thereby discarding any -1 caused by the failed match. There is an implicit conversion from FALSE/TRUE to 0/1 occurring in this sum. The result sapply will again be a vector, and thus will fit the input vector well.

This function can be used to overwrite a data frame, as requested in one comment . Suppose you have a data frame named foo that has rows in the bar column and needs to be modified to contain these numbers in the new baz column. You can write it like

 foo <- transform(foo, baz = countSpaces(bar)) 
+14
source

Another way is to use the strsplit function:

 R> strsplit("I am going Out", " ")[[1]] [1] "I" "am" "going" "Out" 

So, we will divide the first argument - I am going Out - into the second argument - empty space. Then we can just use length :

 R> length(strsplit("I am going Out", " ")[[1]]) [1] 4 
+3
source

I have to admit that I didn’t read very carefully that this was not like you, but there is a possibility ...

 x <- "I am going Out" nchar(x)- nchar(gsub(" ", "", x)) 

An alternative to the original MvG proposal (albeit less cute):

 as.data.frame(matrix(unlist(strsplit("I am going Out", "\\s+", perl=TRUE)), nrow=1)) 
+3
source

You can also use str_count from the stringr package. They are less detailed, and avoiding regular expressions can be a little faster.

 library(stringr) text = "I am going Out" #matches regular expression str_count(text, ' ') 

Or if you want something faster

 #matches literal text str_count(text, fixed(' ')) 
+1
source

Source: https://habr.com/ru/post/925291/


All Articles