This may not be the direct answer to your question about RegEx performance degradation - which is somewhat fun. However - after reading the entire commentary and discussion above - I would suggest the following:
Parse the data once, dividing the mapped data into a database table. It looks like you are trying to capture the following fields:
Player_Name | Monetary_Value
If you need to create a database table containing these values for each row, and then catch each new row as it is created - analyze it and add it to the data table - you can easily do any analysis / calculation against the data - without having to repeat 25M rows over and over (which is waste).
In addition, in the first run, if you were to split 25M records into 100,000 record blocks, run the algorithm 250 times (100,000 x 250 = 25,000,000) - you can enjoy all the performance that you describe without slowing down, because you do the work.
In other words, consider the following:
Create the database table as follows:
CREATE TABLE PlayerActions ( RowID INT PRIMARY KEY IDENTITY, Player_Name VARCHAR(50) NOT NULL, Monetary_Value MONEY NOT NULL )
Create an algorithm that breaks your 25 meter lines into 100 thousand pieces. An example of using LINQ / EF5 as an assumption.
public void ParseFullDataSet(IEnumerable<String> dataSource) { var rowCount = dataSource.Count(); var setCount = Math.Floor(rowCount / 100000) + 1; if (rowCount % 100000 != 0) setCount++; for (int i = 0; i < setCount; i++) { var set = dataSource.Skip(i * 100000).Take(100000); ParseSet(set); } } public void ParseSet(IEnumerable<String> dataSource) { String playerName = String.Empty; decimal monetaryValue = 0.0m;
Do one of the above to download all previously downloaded data.
Create a hook somewhere that allows you to detect the addition of a new line. Each time a new line is created, call:
ParseSet(new List<String>() { newValue });
or if multiple numbers are created immediately, call:
ParseSet(newValues);
Now you can do any computational analysis or data mining that you want from the data, without having to worry about the performance of more than 25 m lines on the fly.
source share