Extract dates from file name

I have a situation where I need to extract dates from file names whose common template is [filename_]YYYYMMDD[.fileExtension]

eg. "xxx_20100326.xls" or x2v_20100326.csv

The program below runs

 //Number of charecter in the substring is set to 8 //since the length of YYYYMMDD is 8 public static string ExtractDatesFromFileNames(string fileName) { return fileName.Substring(fileName.IndexOf("_") + 1, 8); } 

Is there a better option to achieve the same?

I am mostly looking for standard practice.

I am using C # 3.0 and dotnet framework 3.5

Edit:

I like the solution and the LC response method. I used his program, for example

 string regExPattern = "^(?:.*_)?([0-9]{4})([0-9]{2})([0-9]{2})(?:\\..*)?$"; string result = Regex.Match(fileName, @regExPattern).Groups[1].Value; 

Function input: "x2v_20100326.csv"

But conclusion: 2010 instead of 20100326 (which is expected).

Someone can help.

+4
source share
3 answers

I would use a regex, especially if it had more than one underscore. Then you can capture the year, month, day and, if necessary, return a DateTime . This way you can make sure that you are extracting the desired part of the file name, and this really matches the pattern you are looking for.

For the template [filename_]YYYYMMDD[.fileExtension] I think something like:

 ^(?:.*_)?([0-9]{4})([0-9]{2})([0-9]{2})(?:\..*)?$ 

Then your captured groups will be year, month and day in that order.

Explanation:

^ : start of line.

(?:.*_)? : An optional non-capture group containing any number of characters followed by an underscore.

([0-9]{4}) : A capture group containing exactly four digits.

([0-9]{2}) : A capture group containing exactly two digits.

(?:\..*)? : An optional non-capture group containing a dot followed by any number of characters.

$ : end of your line.

However, I will add that if you are sure that your file names have one and only one underscore, and the date follows the underscore, the code you have is cleaner and will probably be slightly faster than the regular expression. This must be kept in mind based on the expected set of input data.

+2
source

The code you have is sufficient if you are sure that the input format is compliant. If it is likely that this will not happen, you should add some error handling for scenarios where there is no underscore, or the days / months are not represented by two digits (which will ruin the 8 substring of characters), and then a DateTime.TryParse to guarantee the real date.

Other options:

  • Regex : overkill for such a well-defined pattern.
  • LINQ : using the methods SkipWhile , Skip , TakeWhile to ignore the underscore and fix the numbers until a period is encountered. This query is confusing, and the result should be converted to a string.
  • String.Split : divide by { '_', '.' } { '_', '.' } and use an array element representing the date.

None of these options will produce code that looks crisper than what you already have, and performance probably won't be better.

+2
source

Your code is fine, except that you can check the return value of IndexOf if you encounter a file without _, that is.

  int index = fileName.IndexOf("_"); if (index != -1) return fileName.Substring(index + 1, 8); else ... 

If you want to check if this is a valid date, you can call DateTime.TryParseExact

0
source

Source: https://habr.com/ru/post/1312823/


All Articles