Let me continue and add a disclaimer of this. This is absolutely not a good idea (but it was a fun task). The regular expression (s) I'm going to present will analyze the test cases in the question, but they obviously are not bullet proof. Using the analyzer can save you a lot of headache later. I tried to find a parser for VBA, but came up empty-handed (and I guess everyone else too).
Regex
For this to be good, you need to have some control over the VBA code. If you cannot do this, you really need to look at the parser instead of using regular expressions. However, judging by what you have already said, you may have a little control. Perhaps this will help.
So, for this I had to split the regular expression into two different regular expressions. The reason for this is that the .NET Regex library cannot process capture groups in a repeating group.
Capturing a string and starting parsing, this will put the variables (with values) in one group, but the second Regex will analyze them. Just fyi, regular expressions use negative lookbehinds.
^(?:(?<Accessibility>Private|Public|Global)\s)?Const\s(?<variable>[a-zA-Z][a-zA-Z0-9_]*(?:[%&@!#$])?(?:\sAs)?\s(?:(?:[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s[^',]+(?:(?:(?!"").)+"")?(?:,\s)?){1,}(?:'(?<comment>.+))?$
Regex Demo
Here is a regular expression for analyzing variables
(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&@!#$])?(?:\sAs)?\s(?:(?<reference>[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s(?<value>[^',]+(?:(?:(?!").)+")?),?
Regex Demo
And here is some C # code you can toss and check everything. This should make it easier to check for any edge cases that you have.
static void Main(string[] args) { List<String> test = new List<string> { "Const foo = 123", "Const foo$ = \"123\"", "Const foo As String = \"1'2'3\"", "Const foo As String = \"123\"", "Private Const foo = 123", "Public Const foo As Integer = 123", "Global Const foo% = 123", "Const foo = 123 'this comment is included as part of the value", "Const foo = 123, bar = 456", "'Const foo As String = \"123\"", }; foreach (var str in test) Parse(str); Console.Read(); } private static Regex parse = new Regex(@"^(?:(?<Accessibility>Private|Public|Global)\s)?Const\s(?<variable>[a-zA-Z][a-zA-Z0-9_]*(?:[%&@!#$])?(?:\sAs)?\s(?:(?:[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s[^',]+(?:(?:(?!"").)+"")?(?:,\s)?){1,}(?:'(?<comment>.+))?$", RegexOptions.Compiled | RegexOptions.Singleline, new TimeSpan(0, 0, 20)); private static Regex variableRegex = new Regex(@"(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&@!#$])?(?:\sAs)?\s(?:(?<reference>[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s(?<value>[^',]+(?:(?:(?!"").)+"")?),?", RegexOptions.Compiled | RegexOptions.Singleline, new TimeSpan(0, 0, 20)); public static void Parse(String str) { Console.WriteLine(String.Format("Parsing: {0}", str)); var match = parse.Match(str); if (match.Success) { //Private/Public/Global var accessibility = match.Groups["Accessibility"].Value; //Since we defined this with atleast one capture, there should always be something here. foreach (Capture variable in match.Groups["variable"].Captures) { //Console.WriteLine(variable); var variableMatch = variableRegex.Match(variable.Value); if (variableMatch.Success) { Console.WriteLine(String.Format("Identifier: {0}", variableMatch.Groups["identifier"].Value)); if (variableMatch.Groups["specifier"].Success) Console.WriteLine(String.Format("specifier: {0}", variableMatch.Groups["specifier"].Value)); if (variableMatch.Groups["reference"].Success) Console.WriteLine(String.Format("reference: {0}", variableMatch.Groups["reference"].Value)); Console.WriteLine(String.Format("value: {0}", variableMatch.Groups["value"].Value)); Console.WriteLine(""); } else { Console.WriteLine(String.Format("FAILED VARIABLE: {0}", variable.Value)); } } if (match.Groups["comment"].Success) { Console.WriteLine(String.Format("Comment: {0}", match.Groups["comment"].Value)); } } else { Console.WriteLine(String.Format("FAILED: {0}", str)); } Console.WriteLine("+++++++++++++++++++++++++++++++++++++++++++++"); Console.WriteLine(""); }
The C # code was what I used to test my theory, so I apologize for the insanity in it.
For completeness, here is a small sample output. If you run the code, you will get more output, but this directly shows that it can handle the situations you requested.
Parsing: Const foo = 123 'this comment is included as part of the value Identifier: foo value: 123 Comment: this comment is included as part of the value Parsing: Const foo = 123, bar = 456 Identifier: foo value: 123 Identifier: bar value: 456
What is he processing
Here are the main cases that I can think of that you are probably interested in. It should still handle everything you had before, as I just added to the regular expression that you provided.
- Comments
- Multiple variable declarations on the same line
- Apostrophe (comment symbol) in string value. Those. foo = "She awesome"
- If a line starts with a comment, the line should be ignored
What he does not process
The only thing I really couldn’t do was distance, but it should not be difficult to add to yourself if you need it. So, for example, if the declaration of several variables should be a space after the decimal point. those. (VALID: foo = 123, foobar = 124) (INVALID: foo = 123, foobar = 124)
You will not get much leniency in the format from it, but you cannot do much with this when using regular expressions.
Hope this helps you, and if you need more explanation on how this works, just let me know. Just know that this is a bad idea . You will encounter situations that the regular expression cannot handle. If I were you, I would think of writing a simple parser that would give you more flexibility in the long run. Good luck.