Divide and conquer
You are trying to parse too much with one simple expression. This will not work very well. The best approach in this case is to divide the problem into smaller tasks and solve each separately. Then we can combine everything into one template later.
Let some templates are written for the data you want to extract.
Season / episode:
S\d+(?:E\d+(?:\s*\p{Pd}\s*E\d+)?)?
I used \p{Pd} instead - to place any dash.
Date:
\d{4}\.\d{1,2}\.\d{1,2}
Or...
(?i:January|February|March|April|May|June|July|August|September|October|November|December) \s*\d{1,2},\s*\d{4}
Write a simple template for more information:
.*?
(yes, this is pretty general)
We can also define the display format as follows:
\[.*?\]
If necessary, you can add additional details.
Now we can put everything in one template using group names to retrieve data:
^\s* (?<name>.*?) (?<info> \s+ (?: (?<episode>S\d+(?:E\d+(?:\s*\p{Pd}\s*E\d+)?)?) | (?<date>\d{4}\.\d{1,2}\.\d{1,2}) | \(?(?<date>(?i:January|February|March|April|May|June|July|August|September|October|November|December)\s*\d{1,2},\s*\d{4})\)? | \[(?<format>.*?)\] | (?<extra>(?(info)|(?!)).*?) ))* \s*$
Just ignore the info group (it is used for the conditional expression in extra , so extra does not use what should be part of the display name). And you can get some extra information, so just concatenate them by putting space between each part.
Code example:
var inputData = new[] { "Boyster S01E13 – E14", "Mysteries at the Museum S08E08", "Mysteries at the National Parks S01E07 – E08", "The Last Days Of… S01E06", "Born Naughty? S01E02", "Have I Got News For You S49E07", "Ellen 2015.05.22 Joseph Gordon Levitt [REPOST]", "The Soup 2015.05.22 [mp4]", "Big Brother UK Live From The House (May 22, 2015)", "Alaskan Bush People S02 Wild Times Special", "500 Questions S01E03" }; var re = new Regex(@" ^\s* (?<name>.*?) (?<info> \s+ (?: (?<episode>S\d+(?:E\d+(?:\s*\p{Pd}\s*E\d+)?)?) | (?<date>\d{4}\.\d{1,2}\.\d{1,2}) | \(?(?<date>(?i:January|February|March|April|May|June|July|August|September|October|November|December)\s*\d{1,2},\s*\d{4})\)? | \[(?<format>.*?)\] | (?<extra>(?(info)|(?!)).*?) ))* \s*$ ", RegexOptions.IgnorePatternWhitespace); foreach (var input in inputData) { Console.WriteLine(); Console.WriteLine("--- {0} ---", input); var match = re.Match(input); if (!match.Success) { Console.WriteLine("FAIL"); continue; } foreach (var groupName in re.GetGroupNames()) { if (groupName == "0" || groupName == "info") continue; var group = match.Groups[groupName]; if (!group.Success) continue; foreach (Capture capture in group.Captures) Console.WriteLine("{0}: '{1}'", groupName, capture.Value); } }
And the result of this ...
--- Boyster S01E13 - E14 --- name: 'Boyster' episode: 'S01E13 - E14' --- Mysteries at the Museum S08E08 --- name: 'Mysteries at the Museum' episode: 'S08E08' --- Mysteries at the National Parks S01E07 - E08 --- name: 'Mysteries at the National Parks' episode: 'S01E07 - E08' --- The Last Days Ofâ?¦ S01E06 --- name: 'The Last Days Ofâ?¦' episode: 'S01E06' --- Born Naughty? S01E02 --- name: 'Born Naughty?' episode: 'S01E02' --- Have I Got News For You S49E07 --- name: 'Have I Got News For You' episode: 'S49E07' --- Ellen 2015.05.22 Joseph Gordon Levitt [REPOST] --- name: 'Ellen' date: '2015.05.22' format: 'REPOST' extra: 'Joseph' extra: 'Gordon' extra: 'Levitt' --- The Soup 2015.05.22 [mp4] --- name: 'The Soup' date: '2015.05.22' format: 'mp4' --- Big Brother UK Live From The House (May 22, 2015) --- name: 'Big Brother UK Live From The House' date: 'May 22, 2015' --- Alaskan Bush People S02 Wild Times Special --- name: 'Alaskan Bush People' episode: 'S02' extra: 'Wild' extra: 'Times' extra: 'Special' --- 500 Questions S01E03 --- name: '500 Questions' episode: 'S01E03'