How to combine these lines with Regex?

I mainly have musical file names, such as:

<source> <target> "Travis - Sing" "Travis - Sing 2001.mp3" "Travis - Sing" "Travis - Sing Edit.mp3" "Travis - Sing" "Travis - Sing New Edit.mp3" "Mission Impossible I" "Mission Impossible I - Main Theme.mp3" "Mission Impossible I" "Mission Impossible II - Main Theme.mp3" "Mesrine - Death Instinct" "Mesrine - Death Instinct - Le Million.mp3" "Mesrine - Public Enemy #1" "Mesrine - Public Enemy #1 - Theme" "Se7en" "Se7en Motion Picture Soundtrack - Theme.mp3" 

Brackets are not included in rows (for demonstration only).

and I'm trying to match the "source" with the "target" values.

So, I already have the original names, but now I use a lot of parsing the strings to be able to match them. How can I achieve the same using Regex?

EDIT: There seems to be confusion.

"Travis - Sing" is my original string, and I'm trying to match it with:

 "Travis - Sing (2001).mp3" "Travis - Sing (Edit).mp3" "Travis - Sing (New Edit).mp3" 

EDIT2: Brackets removed.

0
source share
4 answers

You seem to be looking for all files that start with a specific line - this will answer all your examples. This can be easily achieved without regular expressions using two loops or using linq:

 var matches = from source in sources select new { Source = source, Targets = from file in targets where file.StartsWith(source) select file }; 

You can also use a regular expression instead of the StartsWith clause, for example:

 where Regex.IsMatch(file, String.Format("^{0}", source), RegexOptions.IgnoreCase) 

This can probably be optimized in many ways, but Andrew suggests writing a long template that does not accelerate when executed dynamically.

+3
source

From your answer to my comment, I'm sure you're looking for something simple.

Thus, you can have several search terms separated by a "|" symbol. This is an alternative design.

 class Program { private static List<string> searchList = new List<string> { "Travis - Sing (2001).mp3", "Travis - Sing (Edit).mp3", "Mission Impossible I - Main Theme.mp3", "Mission Impossible II - Main Theme.mp3", "doesn't match" }; static void Main(string[] args) { var matchRegex = new Regex("Travis - Sing|Mission Impossible I"); var matchingStrings = searchList.Where(str => matchRegex.IsMatch(str)); foreach (var str in matchingStrings) { Console.WriteLine(str); } } } 

EDIT If you want to find out what you are facing, you can add groups :

  static void Main(string[] args) { var matchRegex = new Regex("(?<travis>Travis - Sing)|(?<mi>Mission Impossible I)"); foreach (var str in searchList) { var match = matchRegex.Match(str); if (match.Success) { if (match.Groups["travis"].Success) { Console.WriteLine(String.Format("{0} matches against travis", str)); } else if (match.Groups["mi"].Success) { Console.WriteLine(String.Format("{0} matches against mi", str)); } } } } 
+2
source

Are there always several gaps between the source and the target? If yes, then the following will be consistent:

 /^(.*?)\s{2,}(.*?)$/ 

It basically corresponds to two elements: one before any space in 2+ spaces and one after this gap. (Capture patterns do not use greedy .*? So if there are more than two spaces, additional spaces will not be captured in both.)

+1
source

The following method is more reliable (allows a different number of spaces or tweaks between the source and target). For instance. the target may have extra spaces between words, but they will still match.

First, indicate the characters that are allowed as word separators in your string. Then split the source and target strings into tokens using delimiters. Then check to see if the words in your source are found as starting words.

eg. (Java) I used spaces and hyphens as separators

 public boolean isValidMatch(String source, String target){ String[] sourceTokens = source.split("[\\s\\-]+"); // split on sequence of //whitespaces or dashes. Two dashes between words will still split //same as one dash. String[] targetTokens = target.split("[\\s\\-]+"); // split similarly if(sourceTokens.length>targetTokens.length){ return false; } for(int i=0;i<souceTokens.length;i++){ if(!sourceTokens[i].equals(targetTokens[i])){ return false; } } return true; } 

PS: You might want to add a point. symbol as a separator, if you have the source "Hello World" and the target "Hello World.mp3"; This will not match at the moment, as the regular expression is not split into a dot, but if you add your separator to include the dot, then it will.

+1
source

All Articles