C # compare two lines for word matching

I have two lines containing letters and numbers, separated by spaces. former "elza7ma wa2fa fel matab" and "2ana ba7eb el za7ma 2awy 2awy"

What is the fastest way to compare these two lines to find out if they have a common word?

I tried breaking one of them using string.split and using string.compare for the entire array of words. but it is very slow, since I will compare many lines.

+6
string c #
source share
5 answers

LINQ Solution

"elza7ma wa2fa fel matab".Split() .Intersect("2ana ba7eb el za7ma 2awy 2awy".Split()) .Any(); // as a string extension method public static class StringExtensions { public static bool OneWordMatches(this string theString, string otherString) { return theString.Split().Intersect(otherString.Split()).Any(); } } // returns true "elza7ma wa2fa fel matab 2ana".OneWordMatches("2ana ba7eb el za7ma 2awy 2awy"); 
+14
source share

I think the easiest way is to break the lines into words and use a given structure, such as HashSet<string> , to check for duplicates. for example

 public bool HasMatchingWord(string left, string right) { var hashSet = new HashSet<string>( left.Split(" ", StringSplitOptions.RemoveEmptyEntries)); return right .Split(" ", StringSplitOptions.RemoveEmptyEntries) .Any(x => hashSet.Contains(x)); } 
+5
source share

You can split two lines by word and build two hash tables / dictionaries. Then go through both and add the keys that increase int in the third dictionary ( Dictionary<string, int> ). If any key in the third dictionary has a number more than one, this word is in both source lines.

I would think that any algorithm to solve this problem would be "slow" - especially for large input lines / many words.

+1
source share

I would probably take an initial performance hit and split the string, and then sorted alphabetically and by word length. If you just need to find out if one word matches, take a break as soon as you find it. When you have arrays of split lines sorted alphabetically and by length, this limits the number of comparisons you would need to make.

0
source share
  • The easiest way is to compare all words with any other word. This is a simple solution, but slow.
  • Another way is to sort both lists and then compare the top two entries. Like mergesort, but with the goal of finding equal words.
  • Another way is to compile a list of words into a tree and match the words with this tree. A regular expression can do this, or you can do it yourself. In your example, the first letter should be 2, b, e, or z. Thus, each word is checked only once and the smallest number of characters is checked.
0
source share

All Articles