Is this the best way to create a frequency table using LINQ?

I want to write a function that reads a file and counts the number of times each word happens. Assuming the file is being read and creating a list of lines representing each line in the file, I need a function to count the occurrence of each word. First, uses Dictionary<string,int> better approach? The key is the word, and the meaning is the number of occurrences of the word.

I wrote this function that iterates over each line and every word in a line and creates a dictionary:

 static IDictionary<string, int> CountWords(IEnumerable<string> lines) var dict = new Dictionary<string, int>(); foreach (string line in lines) { string[] words = line.Split(' '); foreach (string word in words) { if (dict.ContainsKey(word)) dict[word]++; else dict.Add(word, 1); } } 

However, I would like to write this function somehow .. functionally using LINQ (because LINQ is fun and I'm trying to improve my functional programming skills: D) I managed to come up with this expression, m not sure if this is the best way to do it is functional:

 static IDictionary<string, int> CountWords2(IEnumerable<string> lines) { return lines .SelectMany(line => line.Split(' ')) .Aggregate(new Dictionary<string, int>(), (dict, word) => { if (dict.ContainsKey(word)) dict[word]++; else dict.Add(word, 1); return dict; }); } 

So, although I have two working solutions, I am also interested to know what is the best approach to this problem. Anyone with an understanding of LINQ and FP?

+4
source share
4 answers

As Tim Robinson wrote, you can use GroupBy with ToDictionary as follows

  public static Dictionary<string, int> CountWords3(IEnumerable<string> strings) { return strings.SelectMany(s => s.Split(' ')).GroupBy(w=>w).ToDictionary(g => g.Key, g => g.Count()); } 
+6
source

Look at GroupBy instead of Aggregate - it will give you a set of IGrouping<string, string> objects. You can get the score of each word by calling .Count() for each group.

+3
source

The following should do the job.

 static IDictionary<String, Int32> CountWords(IEnumerable<String> lines) { return lines .SelectMany(line => line.Split(' ')) .GroupBy(word => word) .ToDictionary(group => group.Key, group => group.Count()); } 
+3
source

if you want to use linq (and not use the extension methods used by linq for direct access), you can write:

 var groups = from line in lines from s in line.Split(new []{"\t", " "},StringSplitOptions.RemoveEmptyEntries) group s by s into g select g; var dic = groups.ToDictionary(g => g.Key,g=>g.Count()); 

your current implementation will not be split in the tab and may include the string "word". Perhaps I have changed the split according to what I think is your intentions.

0
source

Source: https://habr.com/ru/post/1315635/


All Articles