How to parse a text file in C #

How to parse a text file in C #?

+4
source share
10 answers

Note this interesting approach, Linq To Text Files is very nice, you only need the IEnumerable<string> method, which gives each file.ReadLine() , and you execute the request.

Here is another article that better explains the same technique.

+8
source
 using (TextReader rdr = new StreamReader(fullFilePath)) { string line; while ((line = rdr.ReadLine()) != null) { // use line here } } 

set the variable "fullFilePath" to the full path, for example. C: \ Temp \ myTextFile.txt

+5
source

The algorithm may look like this:

  • Open text file
  • For each line in the file:
  • Line of analysis

There are several approaches to string analysis.

The simplest newbie is using String methods.

System.String on MSDN

If you need an extra task, you can use the System.Text.RegularExpression library to parse your text.

RegEx on MSDN

+3
source

You might want to use a helper class like the one described at http://www.blackbeltcoder.com/Articles/strings/a-text-parsing-helper-class .

+1
source

With years of analyzing CSV files, including those with broken or having extreme cases, here is my code that passes almost all of my unit tests:

 /// <summary> /// Read in a line of text, and use the Add() function to add these items to the current CSV structure /// </summary> /// <param name="s"></param> public static bool TryParseCSVLine(string s, char delimiter, char text_qualifier, out string[] array) { bool success = true; List<string> list = new List<string>(); StringBuilder work = new StringBuilder(); for (int i = 0; i < s.Length; i++) { char c = s[i]; // If we are starting a new field, is this field text qualified? if ((c == text_qualifier) && (work.Length == 0)) { int p2; while (true) { p2 = s.IndexOf(text_qualifier, i + 1); // for some reason, this text qualifier is broken if (p2 < 0) { work.Append(s.Substring(i + 1)); i = s.Length; success = false; break; } // Append this qualified string work.Append(s.Substring(i + 1, p2 - i - 1)); i = p2; // If this is a double quote, keep going! if (((p2 + 1) < s.Length) && (s[p2 + 1] == text_qualifier)) { work.Append(text_qualifier); i++; // otherwise, this is a single qualifier, we're done } else { break; } } // Does this start a new field? } else if (c == delimiter) { list.Add(work.ToString()); work.Length = 0; // Test for special case: when the user has written a casual comma, space, and text qualifier, skip the space // Checks if the second parameter of the if statement will pass through successfully // eg "bob", "mary", "bill" if (i + 2 <= s.Length - 1) { if (s[i + 1].Equals(' ') && s[i + 2].Equals(text_qualifier)) { i++; } } } else { work.Append(c); } } list.Add(work.ToString()); // If we have nothing in the list, and it possible that this might be a tab delimited list, try that before giving up if (list.Count == 1 && delimiter != DEFAULT_TAB_DELIMITER) { string[] tab_delimited_array = ParseLine(s, DEFAULT_TAB_DELIMITER, DEFAULT_QUALIFIER); if (tab_delimited_array.Length > list.Count) { array = tab_delimited_array; return success; } } // Return the array we parsed array = list.ToArray(); return success; } 

However, this function does not actually analyze all valid CSV files! Some files have built-in newlines in them, and you need to enable a stream reader to parse multiple lines to return an array. Here is the tool that does this:

 /// <summary> /// Parse a line whose values may include newline symbols or CR/LF /// </summary> /// <param name="sr"></param> /// <returns></returns> public static string[] ParseMultiLine(StreamReader sr, char delimiter, char text_qualifier) { StringBuilder sb = new StringBuilder(); string[] array = null; while (!sr.EndOfStream) { // Read in a line sb.Append(sr.ReadLine()); // Does it parse? string s = sb.ToString(); if (TryParseCSVLine(s, delimiter, text_qualifier, out array)) { return array; } } // Fails to parse - return the best array we were able to get return array; } 

For reference, I posted the open source CSV code at code.google.com .

+1
source

If you have a more trivial language, use a parser generator. It annoyed me, but I heard good things about ANTLR (Note: get the manual and read it before you start. Used a different parser generator before you get it right right off the bat, at least I didnโ€™t)

There are other tools.

0
source

Not knowing which text file you are discussing now, it is difficult to answer. However, the FileHelpers library has a wide range of tools that help in fixed-length file formats, multi-recorder, demarcation, etc.

0
source

What do you mean by disassembly? Parse usually means splitting input into tokens, which you can do if you are trying to implement a programming language. If you just want to read the contents of a text file, see System.IO.FileInfo.

0
source

A slight improvement in Pero's answer:

 FileInfo txtFile = new FileInfo("c:\myfile.txt"); if(!txtFile.Exists) { // error handling } using (TextReader rdr = txtFile.OpenText()) { // use the text file as Pero suggested } 

The FileInfo class gives you the ability to "do things" with a file before you start reading it. You can also pass it between functions as the best abstraction of the file location (instead of using the full path string). FileInfo canonizes the path so that it is absolutely correct (for example, turning / to \ where necessary) and allows you to extract additional data about the directory: parent directory, extension, only name, permissions, etc.

0
source

First, make sure you have the following namespaces:

 using System.Data; using System.IO; using System.Text.RegularExpressions; 

Then we will create a function that parses any CSV input string into a DataTable:

 public DataTable ParseCSV(string inputString) { DataTable dt=new DataTable(); // declare the Regular Expression that will match versus the input string Regex re=new Regex("((?<field>[^\",\\r\\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))"); ArrayList colArray=new ArrayList(); ArrayList rowArray=new ArrayList(); int colCount=0; int maxColCount=0; string rowbreak=""; string field=""; MatchCollection mc=re.Matches(inputString); foreach(Match m in mc) { // retrieve the field and replace two double-quotes with a single double-quote field=m.Result("${field}").Replace("\"\"","\""); rowbreak=m.Result("${rowbreak}"); if (field.Length > 0) { colArray.Add(field); colCount++; } if (rowbreak.Length > 0) { // add the column array to the row Array List rowArray.Add(colArray.ToArray()); // create a new Array List to hold the field values colArray=new ArrayList(); if (colCount > maxColCount) maxColCount=colCount; colCount=0; } } if (rowbreak.Length == 0) { // this is executed when the last line doesn't // end with a line break rowArray.Add(colArray.ToArray()); if (colCount > maxColCount) maxColCount=colCount; } // create the columns for the table for(int i=0; i < maxColCount; i++) dt.Columns.Add(String.Format("col{0:000}",i)); // convert the row Array List into an Array object for easier access Array ra=rowArray.ToArray(); for(int i=0; i < ra.Length; i++) { // create a new DataRow DataRow dr=dt.NewRow(); // convert the column Array List into an Array object for easier access Array ca=(Array)(ra.GetValue(i)); // add each field into the new DataRow for(int j=0; j < ca.Length; j++) dr[j]=ca.GetValue(j); // add the new DataRow to the DataTable dt.Rows.Add(dr); } // in case no data was parsed, create a single column if (dt.Columns.Count == 0) dt.Columns.Add("NoData"); return dt; } 

Now that we have a parser to convert the string to a DataTable, all we need now is a function that will read the contents from the CSV file and pass it to our ParseCSV function:

 public DataTable ParseCSVFile(string path) { string inputString=""; // check that the file exists before opening it if (File.Exists(path)) { StreamReader sr = new StreamReader(path); inputString = sr.ReadToEnd(); sr.Close(); } return ParseCSV(inputString); } 

And now you can easily populate the DataGrid with data coming from a CSV file:

 protected System.Web.UI.WebControls.DataGrid DataGrid1; private void Page_Load(object sender, System.EventArgs e) { // call the parser DataTable dt=ParseCSVFile(Server.MapPath("./demo.csv")); // bind the resulting DataTable to a DataGrid Web Control DataGrid1.DataSource=dt; DataGrid1.DataBind(); } 

Congratulations! Now you can parse the CSV in a DataTable. Good luck with your programming.

0
source

All Articles