How to word iterate over a string in C #?

I want to iterate over a string as word by word.

If I have the line "eventno and fintype or unitno", I would like to read each word one by one like "eventno", "and", "fintype", "or" and "unitno".

+4
source share
10 answers
foreach (string word in "incidentno and fintype or unitno".Split(' ')) { ... } 
+15
source
 var regex = new Regex(@"\b[\s,\.-:;]*"); var phrase = "incidentno and fintype or unitno"; var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x)); 

This works even if you have " .,; tabs and new lines " between your words.

+12
source

A bit distorted, I know, but you can define an iterator block as an extension method for strings. eg.

  /// <summary> /// Sweep over text /// </summary> /// <param name="Text"></param> /// <returns></returns> public static IEnumerable<string> WordList(this string Text) { int cIndex = 0; int nIndex; while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1) { int sIndex = (cIndex == 0 ? 0 : cIndex + 1); yield return Text.Substring(sIndex, nIndex - sIndex); cIndex = nIndex; } yield return Text.Substring(cIndex + 1); } foreach (string word in "incidentno and fintype or unitno".WordList()) System.Console.WriteLine("'" + word + "'"); 

Who has the advantage of not creating a large array for long strings.

+11
source

Use the Split method of the string class

 string[] words = "incidentno and fintype or unitno".Split(" "); 

This is divided into spaces, so the words will have [incidentno,and,fintype,or,unitno] .

+4
source

Assuming words are always separated by spaces , you can use String.Split () to get an array of your words.

+3
source

There are several ways to do this. Two of the most convenient methods (in my opinion):

  • Using string.Split () to create an array. I would probably use this method because it is the most understandable.

Example:

 string startingSentence = "incidentno and fintype or unitno"; string[] seperatedWords = startingSentence.Split(' '); 

Alternatively you can use (this is what I would use):

 string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries); 

StringSplitOptions.RemoveEmptyEntries will remove any empty entries from your array that might occur due to extra spaces and other minor issues.

Next, for word processing, you should use:

 foreach (string word in seperatedWords) { //Do something } 
  • Or you can use regular expressions to solve this problem, as Darin demonstrated (copy below).

Example:

 var regex = new Regex(@"\b[\s,\.-:;]*"); var phrase = "incidentno and fintype or unitno"; var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x)); 

For processing, you can use a similar code for the first option.

 foreach (string word in words) { //Do something } 

Of course, there are many ways to solve this problem, but I think that these two would be the easiest to implement and maintain. I would go with the first option (using string.Split ()) just because a regex can sometimes get quite confusing, while a split will work correctly most of the time.

+2
source

When using split, what about checking for empty records?

 string sentence = "incidentno and fintype or unitno" string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries); foreach (string word in words) { // Process } 

EDIT:

I cannot comment, so I post here, but this (published above) works:

 foreach (string word in "incidentno and fintype or unitno".Split(' ')) { ... } 

My understanding of foreach is that it first executes GetEnumerator () and .MoveNext calls until false is returned. Therefore .Split will not be reevaluated at each iteration

+1
source
 public static string[] MyTest(string inword, string regstr) { var regex = new Regex(regstr); var phrase = "incidentno and fintype or unitno"; var words = regex.Split(phrase); return words; } 

? MyTest ("incidentally, and .fintype- or ;: unitno", @ "[^ \ w +]")

 [0]: "incidentno" [1]: "and" [2]: "fintype" [3]: "or" [4]: "unitno" 
0
source

I would like to add some information to awnser JDunkerley.
You can easily make this method more reliable if you give a string or char parameter to search.

 public static IEnumerable<string> WordList(this string Text,string Word) { int cIndex = 0; int nIndex; while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1) { int sIndex = (cIndex == 0 ? 0 : cIndex + 1); yield return Text.Substring(sIndex, nIndex - sIndex); cIndex = nIndex; } yield return Text.Substring(cIndex + 1); } public static IEnumerable<string> WordList(this string Text, char c) { int cIndex = 0; int nIndex; while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1) { int sIndex = (cIndex == 0 ? 0 : cIndex + 1); yield return Text.Substring(sIndex, nIndex - sIndex); cIndex = nIndex; } yield return Text.Substring(cIndex + 1); } 
0
source

I am writing a string processor class. You can use it.

Example:

 metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString(); 

Grade:

  public static class StringProcessor { private static List<String> PrepositionList; public static string ToNormalString(this string strText) { if (String.IsNullOrEmpty(strText)) return String.Empty; char chNormalKaf = (char)1603; char chNormalYah = (char)1610; char chNonNormalKaf = (char)1705; char chNonNormalYah = (char)1740; string result = strText.Replace(chNonNormalKaf, chNormalKaf); result = result.Replace(chNonNormalYah, chNormalYah); return result; } public static List<KeyValuePair<String, Int32>> Process(this String bodyText, List<String> blackListWords = null, int minimumWordLength = 3, char splitor = ' ', bool perWordIsLowerCase = true) { string[] btArray = bodyText.ToNormalString().Split(splitor); long numberOfWords = btArray.LongLength; Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1); foreach (string word in btArray) { if (word != null) { string lowerWord = word; if (perWordIsLowerCase) lowerWord = word.ToLower(); var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "") .Replace("?", "").Replace("!", "").Replace(",", "") .Replace("<br>", "").Replace(":", "").Replace(";", "") .Replace("،", "").Replace("-", "").Replace("\n", "").Trim(); if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords))) { if (wordsDic.ContainsKey(normalWord)) { var cnt = wordsDic[normalWord]; wordsDic[normalWord] = ++cnt; } else { wordsDic.Add(normalWord, 1); } } } } List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList(); return keywords; } public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true) { List<KeyValuePair<String, Int32>> result = null; if (isBasedOnFrequency) result = list.OrderByDescending(q => q.Value).ToList(); else result = list.OrderByDescending(q => q.Key).ToList(); return result; } public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10) { List<KeyValuePair<String, Int32>> result = list.Take(n).ToList(); return result; } public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list) { List<String> result = new List<String>(); foreach (var item in list) { result.Add(item.Key); } return result; } public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list) { List<Int32> result = new List<Int32>(); foreach (var item in list) { result.Add(item.Value); } return result; } public static String AsString<T>(this List<T> list, string seprator = ", ") { String result = string.Empty; foreach (var item in list) { result += string.Format("{0}{1}", item, seprator); } return result; } private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords) { bool result = false; if (blackListWords == null) return false; foreach (var w in blackListWords) { if (w.ToNormalString().Equals(word)) { result = true; break; } } return result; } } 
-one
source

All Articles