Dynamically constructed regular expressions work very slowly!

I dynamically generate regular expressions by skipping some xml structure and creating an instruction when I run its node types. I use this regex as part of the layout type that I defined. Then I parse the text file with the identifier at the beginning of each line. This identifier points me to a specific layout. Then I try to match the data in this line with its regular expression.

Sounds ok and dandy? The only problem is that the matching strings are very slow. I set them as compiled to try to speed things up a bit, but to no avail. What is puzzling is that these expressions are not so complex. I'm not a RegEx guru, but I know a decent amount about them so that everything is fine.

Here is the code that generates the expressions ...

StringBuilder sb = new StringBuilder();
//get layout id and memberkey in there...
sb.Append(@"^([0-9]+)[ \t]{1,2}([0-9]+)"); 
foreach (ColumnDef c in columns)
{
    sb.Append(@"[ \t]{1,2}");
    switch (c.Variable.PrimType)
    {
        case PrimitiveType.BIT:
            sb.Append("(0|1)");
            break;
        case PrimitiveType.DATE:
            sb.Append(@"([0-9]{2}/[0-9]{2}/[0-9]{4})");
            break;
        case PrimitiveType.FLOAT:
            sb.Append(@"([-+]?[0-9]*\.?[0-9]+)");
            break;
        case PrimitiveType.INTEGER:
            sb.Append(@"([0-9]+)");
            break;
        case PrimitiveType.STRING:
            sb.Append(@"([a-zA-Z0-9]*)");
            break;
    }
}
sb.Append("$");
_pattern = new Regex(sb.ToString(), RegexOptions.Compiled);

The actual slow part ...

public System.Text.RegularExpressions.Match Match(string input)
{
    if (input == null)
       throw new ArgumentNullException("input");

    return _pattern.Match(input);
}

A typical "_pattern" may have about 40-50 columns. I will save from inserting the whole template. I am trying to group each case so that I can list each case in the Match object later.

Any tips or modifications that could be of significant help? Or is it a slow wait?

EDIT FOR CLARITY: Sorry, I don't think I was clear enough for the first time.

XML . . , , , , . , , , .

+5
6

50 CSV ( ) ?

, \t. . ColumnDef, , .

: , , , , .

Edit2: , , () , , , " Sytax 30 12, ."

+8

:

  • [01] (0|1)
  • (?: expr ) ( )

, , ?

+5

, . , , .

, , , . , , 100 , .

  • Regex: "^(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+(?:[a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)$"

  • : "www.stackoverflow.com"

    • , , 10000 : 0.0018
    • , , 10000 : 0.0021
    • , , 10 000 : 0.0287
    • , , 10000 : 4.8144

, 10 000 . .

  • , , 1,000,000 : 0.00137
  • , , 1,000,000 : 0.00225
+4

. StringBuilder .

, ( ), , - .

... , . , ( ).

, , , ( XML ?) , , , .

+2

50 . , , .

  • , , - .
  • , , .
  • , Ants Profiler, .
+1

lexer .

, , , , . , XML .

- , , 5-10 .

PrimitiveType [] xml GetValues ​​.

, .

"ScanXYZ" . . .

public IEnumerable<object[]> GetValues(TextReader reader, PrimitiveType[] schema)
{
   while (reader.Peek() > 0)
   {
       var values = new object[schema.Length];
       for (int i = 0; i < schema.Length; ++i)
       {
           switch (schema[i])
           {
               case PrimitiveType.BIT:
                   values[i] = ScanBit(reader);
                   break;
               case PrimitiveType.DATE:
                   values[i] = ScanDate(reader);
                   break;
               case PrimitiveType.FLOAT:
                   values[i] = ScanFloat(reader);
                   break;
               case PrimitiveType.INTEGER:
                   values[i] = ScanInt(reader);
                   break;
               case PrimitiveType.STRING:
                   values[i] = ScanString(reader);
                   break;
           }
       }

       EatTabs(reader);

       if (reader.Peek() == '\n')
       {
            break;
       }

   if (reader.Peek() == '\n')
   {
       reader.Read();
   }
   else if (reader.Peek() >= 0)
   {
       throw new Exception("Extra junk detected!");
   }

   yield return values;

   }

   reader.Read();
}
+1

All Articles