Get words around a line item

I would like to get words that are around a certain position in a line. For example, two words after and two words earlier.

For example, consider the line:

String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother."; String find = "I"; for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1)) { System.out.println(index); } 

Here is the index of the word "I". But I want to be able to substring words around these positions.

I want to be able to print "John and I love" and "and I have two hiking."

You cannot select only single-line strings. A search for "John and" will return the name "John and I like."

Is there any neat, smart way to do this?

+7
source share
5 answers

One word:

You can achieve using the String split() method. This solution is O (n) .

 public static void main(String[] args) { String str = "Hello my name is John and I like to go fishing and "+ "hiking I have two sisters and one brother."; String find = "I"; String[] sp = str.split(" +"); // "+" for multiple spaces for (int i = 2; i < sp.length; i++) { if (sp[i].equals(find)) { // have to check for ArrayIndexOutOfBoundsException String surr = (i-2 > 0 ? sp[i-2]+" " : "") + (i-1 > 0 ? sp[i-1]+" " : "") + sp[i] + (i+1 < sp.length ? " "+sp[i+1] : "") + (i+2 < sp.length ? " "+sp[i+2] : ""); System.out.println(surr); } } } 

Output:

 John and I like to and hiking I have two 

Multi-word:

Regex is a great and clean solution for the case where find is verbose. Due to its nature, it skips cases where the words around also match find (see the example below).

In the algorithm below, all cases are considered (the space of all solutions). Keep in mind that due to the nature of the problem, this solution is in the worst case O (n * m) (with n str and m length find ) .

 public static void main(String[] args) { String str = "Hello my name is John and John and I like to go..."; String find = "John and"; String[] sp = str.split(" +"); // "+" for multiple spaces String[] spMulti = find.split(" +"); // "+" for multiple spaces for (int i = 2; i < sp.length; i++) { int j = 0; while (j < spMulti.length && i+j < sp.length && sp[i+j].equals(spMulti[j])) { j++; } if (j == spMulti.length) { // found spMulti entirely StringBuilder surr = new StringBuilder(); if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); } if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); } for (int k = 0; k < spMulti.length; k++) { if (k > 0){ surr.append(" "); } surr.append(sp[i+k]); } if (i+spMulti.length < sp.length) { surr.append(" "); surr.append(sp[i+spMulti.length]); } if (i+spMulti.length+1 < sp.length) { surr.append(" "); surr.append(sp[i+spMulti.length+1]); } System.out.println(surr.toString()); } } } 

Output:

 name is John and John and John and John and I like 
+10
source

Here is another way I discovered using Regex:

  String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother."; String find = "I"; Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)"); Matcher matcher = pattern.matcher(str); while (matcher.find()) { System.out.println(matcher.group(1)); System.out.println(matcher.group(2)); } 

Output:

 John and like to and hiking have two 
+2
source

Use String.split () to split the text into words. Then find "I" and put the words together:

 String[] parts=str.split(" "); for (int i=0; i< parts.length; i++){ if(parts[i].equals("I")){ String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc.. } } 

Of course, you need to check if i-2 is a valid index, and using a StringBuffer would be convenient in performance if you have a lot of data ...

+1
source
 // Convert sentence to ArrayList String[] stringArray = sentence.split(" "); List<String> stringList = Arrays.asList(stringArray); // Which word should be matched? String toMatch = "I"; // How much words before and after do you want? int before = 2; int after = 2; for (int i = 0; i < stringList.size(); ++i) { if (toMatch.equals(stringList.get(i))) { int index = i; if (0 <= index - before && index + after <= stringList.size()) { StringBuilder sb = new StringBuilder(); for (int i = index - before; i <= index + after; ++i) { sb.append(stringList.get(i)); sb.append(" "); } String result = sb.toString().trim(); //Do something with result } } } 

It extracts two words before and after the match. It can be expanded to print no more than two words before and after, and not exactly two words.

EDIT Damn ... way to slow down and the lack of fantastic ternary operators: /

+1
source
 public static void main(String[] args) { String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother."; String find = "I"; int countWords = 3; List<String> strings = countWordsBeforeAndAfter(str, find, countWords); strings.stream().forEach(System.out::println); } public static List<String> countWordsBeforeAndAfter(String paragraph, String search, int countWordsBeforeAndAfter){ List<String> searchList = new ArrayList<>(); String str = paragraph; String find = search; int countWords = countWordsBeforeAndAfter; String[] sp = str.split(" +"); // "+" for multiple spaces for (int i = 0; i < sp.length; i++) { if (sp[i].equals(find)) { String before = ""; for (int j = countWords; j > 0; j--) { if(ij >= 0) before += sp[ij]+" "; } String after = ""; for (int j = 1; j <= countWords; j++) { if(i+j < sp.length) after += " " + sp[i+j]; } String searhResult = before + find + after; searchList.add(searhResult); } } return searchList; } 
0
source

All Articles