Java counting the number of occurrences of a word in a string

I have a large text file that I am reading, and I need to find out how many times the words appear. For example, a word the. I do it line by line, each line is a line.

I need to make sure that I believe only legitimate the's - thein otherwill not be considered. This means that I know that I need to use regular expressions somehow. What I have tried so far is this:

numSpace += line.split("[^a-z]the[^a-z]").length;  

I understand that the regular expression may be incorrect at the moment, but I tried without it and just tried to find the occurrences of the word the, and I am also mistaken. I was impressed that this would split the string into an array and how many times this array was split, how many times the word in the string. Any ideas I would appreciate.

Update: Given some ideas, I came up with the following:

numThe += line.split("[^a-zA-Z][Tt]he[^a-zA-Z]", -1).length - 1;

Although strange numbers still come out. I was able to get the exact total score (without regex), now my problem is with regex.

+5
source share
8 answers

Using splitto count is not the most efficient, but if you insist on it, the correct way is this:

haystack.split(needle, -1).length -1                            

limit -1, split - 0, , .

API:

, , . [...] n , [...] .

1 length , n N+1.


(.. needle), \b word. word (, "$US"), Pattern.quote it.


:

numThe += line.split("[^a-zA-Z][Tt]he[^a-zA-Z]", -1).length - 1;

. ( ), .

, [Tt]he, , , / , [^a-zA-Z] ( , 5!). , !

- :

"(^|[^a-zA-Z])[Tt]he([^a-zA-Z]|$)"

, .

- ( ) :

"(?<![a-zA-Z])[Tt]he(?![^a-zA-Z])"

[Tt]he, - , . , , split, "" - .


Non- split

split , (, , ). , , , , .

, , Pattern.compile while (matcher.find()) count++;

+9

Java StringTokenizer, , , . "" , .

, , . stem , , .

+4

, . String.indexOf(String, int) /, :

int occurrences = 0;
int index = 0;
while (index < s.length() && (index = s.indexOf("the", index)) >= 0) {
    occurrences++;
    index + 3; //length of 'the'
}
+4

,

     Pattern pattern = Pattern.compile("Thewordyouwant");
        Matcher matcher = pattern.matcher(string);
        int count = 0;
        while(matcher.find())
            count++;
+4

, , . , , , , , .

, , .

+2

\b :

\bthe\b

, split, 1 the string.

+1

"the" boyer-moore [ ] ?

0
public class OccurenceOfWords {
 public static void main(String args[]){    
   String file = "c:\\customer1.txt";
   TreeMap <String ,Integer> index = new TreeMap();

    String []list = null;
      try(    FileReader fr = new FileReader(file);//using arm jdk 7.0 feature
                BufferedReader br = new BufferedReader(fr))
        {
            String line = br.readLine();
            while(line!= null){
                list = line.split("[ \n\t\r:;',.(){}]");
                for(int i = 0 ; i < list.length;i++)
                {
                  String word = list[i].toLowerCase();  
                    if(word.length() != 0)
                    {
                        if(index.get(word)== null)
                        { index.put(word,1);
                         }
                        else    
                        {
                            int occur = index.get(word).intValue();
                            occur++;
                            index.put(word, occur);
                        }
                        line = br.readLine();
                    }  
                }
         }}
                         catch(Exception ex){
                       System.out.println(ex.getMessage());
                       }
                    for(String item : index.keySet()){
                        int repeats = index.get(item).intValue();
                       System.out.printf("\n%10s\t%d",item,repeats);
                 }   
             }               
  }
-2

All Articles