I do not quite understand how to formulate this question. I just started working on a bunch of tweets, I did some basic cleaning, and now some of the tweets look like this:
x <- c("stackoverflow is a great site",
"stackoverflow is a great si",
"stackoverflow is a great",
"omg it is friday and so sunny",
"omg it is friday and so",
"arggh how annoying")
Basically I want to remove repetitions by checking if the first parts of the lines match and will return their longest. In this case, my result should be:
[1]"stackoverflow is a great site"
[2]"omg it is friday and so sunny"
[3]"arggh how annoying"
because all the others are truncated repetitions above. I tried to use
unique(), but it does not return the results I want, as it tries to match the entire length of the strings. Any pointers please?
I am using R version 3.1.1 on Mac OSX 10.7 ...
Thank!
source
share