How to parse a text table in C ++

Question

How to parse a text table in C ++

I am trying to parse a table as a text file using ifstream and evaluate / manipulate each record. However, it is difficult for me to figure out how to approach this because of omissions of individual subjects. Consider the following table:

NEW VER ID NAME 1 2a 4 "ITEM ONE" (2001) 1 7 "2 ITEM" (2002) {OCT} 1.1 10 "SOME ITEM 3" (2003) 1 12 "DIFFERENT ITEM 4" (2004) 1 a4 16 "ITEM5" (2005) {DEC}

As you can see, sometimes there is nothing in the "NEW" column. I want to mark the identifier, name, year (in brackets) and note whether there are brackets or not.

When I started doing this, I was looking for the "split" function, but I realized that it would be a little more complicated due to the above missing elements and sections to be divided.

The only thing I can think of is reading each line word for word, tracking the last number I saw. As soon as I click on the quotation mark, make a note that the last number I saw was an ID (if I used something like a split, the position of the array is right in front of the quotation mark), and then write everything down to the next quote (heading), then finally, start looking for braces and braces for other information. However, this seems really primitive, and I'm looking for a better way to do this.

I do this to improve my C ++ skills and work with larger existing datasets, so I would like to use C ++ if possible, but if another language (I'm looking at Perl or Python) makes it trivially simple , I could just learn how to interact in another language with C ++. What I'm trying to do now is simply to sift through the data anyway, which will eventually become an object in C ++, so I still have a chance to improve my skills in C ++.

EDIT: I also understand that it is possible to end using only regex, but I would like to try using various file / line processing methods if possible.

+4

c ++ string file-io tabular

noisesolo Nov 08 '10 at 20:07

source share

2 answers

Something like that:

Read the first line, find the "ID" and save the index.
Read each row of data with std::getline() .
Create a substring from the data row, starting with the "ID" index you found in the title bar. Use this to initialize std::istringstream with.
Read the identifier with iss >> an_int .
Find the first. " Find the second. " Find ( and remember its index. Find ) and remember this index. Create a substring of characters between these indices and use it to initialize another std::istringstream with. Read the number from this thread.
Find the braces.

0

sbi Nov 08 '10 at 20:23

source share

Steve townsend · Accepted Answer · 2010-11-08T20:12:20+0000

If the column offsets were really corrected (without tabs, only true spaces a la 0x20), I would read the line at a time ( string::getline ) and split it using fixed offsets into a set of four lines ( string::substr ).

Then make every 4th rowset as needed.

I would not hard code the offsets by storing them in a separate input file that describes the input format — like a table description in SQL Server or another database.

How to parse a text table in C ++

More articles: