Reading a file line by line in C #

I am trying to read some text files where each line should be processed. At the moment, I just use StreamReader, and then every time I read every line.

I am wondering if there is a more efficient way (in terms of LoC and readability) to do this using LINQ without sacrificing performance. The examples I saw include loading the entire file into memory and then processing it. In this case, however, I do not think that would be very effective. In the first example, files can get up to about 50 thousand, and in the second example, not all lines of the file should be read (sizes are usually 10 KB).

You can argue that this does not really matter for these small files at the moment, however I believe that this approach leads to inefficient code.

First example:

// Open file using(var file = System.IO.File.OpenText(_LstFilename)) { // Read file while (!file.EndOfStream) { String line = file.ReadLine(); // Ignore empty lines if (line.Length > 0) { // Create addon T addon = new T(); addon.Load(line, _BaseDir); // Add to collection collection.Add(addon); } } } 

Second example:

 // Open file using (var file = System.IO.File.OpenText(datFile)) { // Compile regexs Regex nameRegex = new Regex("IDENTIFY (.*)"); while (!file.EndOfStream) { String line = file.ReadLine(); // Check name Match m = nameRegex.Match(line); if (m.Success) { _Name = m.Groups[1].Value; // Remove me when other values are read break; } } } 
+56
c # linq line
Aug 13 '09 at 10:41
source share
4 answers

Using an iterator block, you can easily write a linear reader based on LINQ:

 static IEnumerable<SomeType> ReadFrom(string file) { string line; using(var reader = File.OpenText(file)) { while((line = reader.ReadLine()) != null) { SomeType newRecord = /* parse line */ yield return newRecord; } } } 

or make John happy:

 static IEnumerable<string> ReadFrom(string file) { string line; using(var reader = File.OpenText(file)) { while((line = reader.ReadLine()) != null) { yield return line; } } } ... var typedSequence = from line in ReadFrom(path) let record = ParseLine(line) where record.Active // for example select record.Key; 

then you have ReadFrom(...) as a lazily evaluated sequence without buffering, ideal for Where , etc.

Note that if you use OrderBy or standard GroupBy , it will have to buffer data in memory; If you need grouping and aggregation, "PushLINQ" has some fancy code that allows you to perform data aggregation, but discard it (without buffering). John explains here .

+91
Aug 13 '09 at 10:45
source share

The easiest way is to read a string and check if it is null than to check EndOfStream all the time.

However, I also have a LineReader class in MiscUtil that makes it all a lot simpler - it basically provides a file (or Func<TextReader> as an IEnumerable<string> that allows you to use LINQ stuff on it. So you can do such things like:

 var query = from file in Directory.GetFiles("*.log") from line in new LineReader(file) where line.Length > 0 select new AddOn(line); // or whatever 

The heart of LineReader is this implementation of IEnumerable<string>.GetEnumerator :

 public IEnumerator<string> GetEnumerator() { using (TextReader reader = dataSource()) { string line; while ((line = reader.ReadLine()) != null) { yield return line; } } } 

Almost all other sources simply provide flexible ways to configure dataSource (which is Func<TextReader> ).

+23
Aug 13 '09 at 10:45
source share

NOTE You need to follow the IEnumerable<T> solution, as this will cause the file to open for processing time.

For example, with the answer of Mark Gravell:

 foreach(var record in ReadFrom("myfile.csv")) { DoLongProcessOn(record); } 

the file will remain open for all processing.

+1
Aug 13 '09 at 10:50
source share

Thank you all for your answers! I decided to go with the mixture, mainly focusing on Mark, although I would only need to read the lines from the file. I think you could argue that separation is necessary everywhere, but heh, life is too short!

As for keeping the file open, this will not be a problem in this case, since the code is part of the desktop application.

Finally, I noticed that you used the bottom line. I know that in Java there is a difference between an uppercase and a non-capitalized string, but I thought the C # line string just had a link to the header string?

 public void Load(AddonCollection<T> collection) { // read from file var query = from line in LineReader(_LstFilename) where line.Length > 0 select CreateAddon(line); // add results to collection collection.AddRange(query); } protected T CreateAddon(String line) { // create addon T addon = new T(); addon.Load(line, _BaseDir); return addon; } protected static IEnumerable<String> LineReader(String fileName) { String line; using (var file = System.IO.File.OpenText(fileName)) { // read each line, ensuring not null (EOF) while ((line = file.ReadLine()) != null) { // return trimmed line yield return line.Trim(); } } } 
0
Aug 13 '09 at 16:21
source share



All Articles