I need to read a large text file with a spatial separation and count the number of instances of each code in the file. In fact, these are the results of several experiments hundreds of thousands of times. The system spills out a text file that looks something like this:
A7PS A8PN A6PP23 ...
And there are literally hundreds of thousands of these records, and I need to count the incidents of each of the codes.
I guess I could just open the StreamReader and go through the line, dividing by a space character. See if the code has already been detected and add 1 to the account of this code. However, this is probably rather naive given the size of the data.
Does anyone know of an efficient algorithm to handle this kind of processing?
UPDATE:
OK, so consensus seems like my approach is on the right lines
I would be interested to hear things like - which is more efficient - StreamReader. TextReader, BinaryReader
What is the best structure to store my results dictionary? HashTable, SortedList, HybridDictionary
If there are no line breaks in the file (I have not received a sample yet), will all this just be split in space to be inefficient?
Essentially, I look to make it as possible as possible.
thanks again
c # algorithm parsing text-processing
Chrisca
source share