How to combine files into one CSV file?

If I have one FOO_1.txt file that contains:

 FOOA FOOB FOOC FOOD ... 

and many other FOO_files.txt files. Each of them contains:

1110000000 ...

one line containing 0 or 1 as the number of values โ€‹โ€‹of FOO1 ( fooa , foob , ...)

Now I want to combine them into one file FOO_RES.csv , which will have the following format:

 FOOA,1,0,0,0,0,0,0... FOOB,1,0,0,0,0,0,0... FOOC,1,0,0,0,1,0,0... FOOD,0,0,0,0,0,0,0... ... 

What is a simple and elegant way of doing this (with a hash and arrays -> $ hash {$ key} = \ @ data)?

Thanks for the help!

Yohad

-2
source share
5 answers

If you cannot clearly describe your data and your desired result, you cannot encode it - accepting a simple project is a good way to start working with a new language.

Let me introduce a simple method that you can use to extract code in any language, whether you know it or not. This method only works for small projects. You will need to actually plan large projects.

How to write a program:

  • Open a text editor and write down what data you have. Make each line comments
  • Describe the desired results.
  • Start by describing the steps necessary to change your data in the desired form.

Figures 1 and 2 complete:

 #!/usr/bin perl use strict; use warnings; # Read data from multiple files and combine it into one file. # Source files: # Field definitions: has a list of field names, one per line. # Data files: # * Each data file has a string of digits. # * There is a one-to-one relationship between the digits in the data file and the fields in the field defs file. # # Results File: # * The results file is a CSV file. # * Each field will have one row in the CSV file. # * The first column will contain the name of the field represented by the row. # * Subsequent values in the row will be derived from the data files. # * The order of subsequent fields will be based on the order files are read. # * However, each column (2-X) must represent the data from one data file. 

Now that you know what you have and where you need to go, you can determine what the program needs to do to get you there - this is step 3:

You know you need to have a list of fields, so get this first:

 # Get a list of fields. # Read the field definitions file into an array. 

Since it is easiest to write CSV in a row-oriented style, you need to process all your files before creating each line. Thus, you need a place to store data.

 # Create a variable to store the data structure. 

Now we read the data files:

 # Get a list of data files to parse # Iterate over list # For each data file: # Read the string of digits. # Assign each digit to its field. # Store data for later use. 

We have all the data in memory, now write the output:

 # Write the CSV file. # Open a file handle. # Iterate over list of fields # For each field # Get field name and list of values. # Create a string - comma separated string with field name and values # Write string to file handle # close file handle. 

Now you can start converting comments into code. For each comment, there can be from 1 to 100 lines of code. You may find that something you need to do is very difficult, and you do not want to accept it at the moment. Create a dummy routine to handle a complex task and ignore it until you get everything else. Now you can solve this complex, thorny sub-problem yourself.

Since you're just learning Perl, you need to get into the docs to learn how to accomplish each of the subtasks represented by the comments you wrote. The best resource for this kind of work is a list of functions by category in perlfunc . The Perl syntax guide is also useful. Since you will need to work with a complex data structure, you will also want to read from the page data collection .

You might be wondering how you should know which perldoc pages you should read for this problem. An article about Perlmonks entitled How RTFM provides a good introduction to documentation and how to use it.

Best of all, if you're stuck, you have a code to share when you ask for help.

+3
source

If I understand correctly, your first file is your key order file, and the rest of the files contain one byte per key in the same order. You want the combined file of these keys with each of their data bytes to be specified together.

In this case, you must open all the files at once. Read one key from the key order file, read one byte from each data file. Print everything when you read its final file. Repeat for each key.

+1
source

Your specifications are not clear. You cannot have โ€œmany other filesโ€ named FOO_files.txt because it is only one name. So I'm going to use this as a template with data files + filelist. In this case, there are files named FOO*.txt , each of which contains "[01] + \ n".

Thus, the idea is to process all the files in the file list file and insert them all into the result file FOO_RES.csv , separated by a comma.

 use strict; use warnings; use English qw<$OS_ERROR>; use IO::Handle; open my $foos, '<', 'FOO_1.txt' or die "I'm dead: $OS_ERROR"; @ARGV = sort map { chomp; "$_.txt" } <$foos>; $foos->close; open my $foo_csv, '>', 'FOO_RES.csv' or die "I'm dead: $OS_ERROR"; while ( my $line = <> ) { my ( $foo_name ) = ( $ARGV =~ /(.*)\.txt$/ ); $foo_csv->print( join( ',', $foo_name, split //, $line ), "\n" ); } $foo_csv->close; 
+1
source

It looks like you have a lot of foo_files that have 1 line in them, something like:

 1110000000 

What does it mean

 fooa=1 foob=1 fooc=1 food=0 fooe=0 foof=0 foog=0 fooh=0 fooi=0 fooj=0 

And it looks like your foo_res is just a summation of these values? In this case, you do not need a hash of arrays, but just a hash.

 my @foo_files = (); #NOT SURE HOW YOU POPULATE THIS ONE my @foo_keys = qw(abcdefghij); my %foo_hash = map{ ( $_, 0 ) } @foo_keys; # initialize hash foreach my $foo_file ( @foo_files ) { open( my $FOO, "<", $foo_file) || die "Cannot open $foo_file\n"; my $line = <$FOO>; close( $FOO ); chomp($line); my @foo_values = split(//, $line); foreach my $indx ( 0 .. $#foo_keys ) { last if ( ! $foo_values[ $indx ] ); # or some kind of error checking if the input file doesn't have all the values $foo_hash{ $foo_keys[$indx] } += $foo_values[ $indx ]; } } 

It is very difficult to understand what you are asking for, but maybe it helps?

+1
source

You really don't need to use a hash. My Perl is a little rusty, so the syntax may be a bit , but basically it does:

 open KEYFILE , "foo_1.txt" or die "cannot open foo_1 for writing"; open VALFILE , "foo_files.txt" or die "cannot open foo_files for writing"; open OUTFILE , ">foo_out.txt"or die "cannot open foo_out for writing"; my %output; while (<KEYFILE>) { my $key = $_; my $val = <VALFILE>; my $arrVal = split(//,$val); $output{$key} = $arrVal; print OUTFILE $key."," . join(",", $arrVal) } 

Edit: OK syntax check

Sinan comment: @Byron, it really seems to me that your first sentence says that OP does not need a hash, but your code has %output , which seems to have no purpose. For reference, the following is a less accurate way to do the same.

 #!/usr/bin/perl use strict; use warnings; use autodie qw(:file :io); open my $KEYFILE, '<', "foo_1.txt"; open my $VALFILE, '<', "foo_files.txt"; open my $OUTFILE, '>', "foo_out.txt"; while (my $key = <$KEYFILE>) { chomp $key; print $OUTFILE join(q{,}, $key, split //, <$VALFILE> ), "\n"; } __END__ 
0
source

All Articles