How to search / replace a bunch of text files in unix (osx)

I have a regex that I successfully tested at http://regexpal.com/ :

^(\".+?\"),\d.+?,"X",-99,-99,-99,-99,-99,-99,-99,(\d*),(\d*) 

Where my test data is as follows:

 "AB101AA",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X" "AB101AF",10,"X",-99,-99,-99,-99,-99,-99,-99,394181,806429,179,"S00","SN9","00","QA","MH","X" "AB101AG",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X" "AB101AH",10,"X",-99,-99,-99,-99,-99,-99,-99,394371,806359,179,"S00","SN9","00","QA","MH","X" "AB101AJ",10,"X",-99,-99,-99,-99,-99,-99,-99,394171,806398,179,"S00","SN9","00","QA","MH","X" "AB101AL",10,"X",-99,-99,-99,-99,-99,-99,-99,394331,806530,179,"S00","SN9","00","QA","MH","X" 

I want to replace it with \1,\2,\3 on each line, so for example, line 1 will give

 "AB101AA",394251,806376 

How to run this search in regex and replace all csv files in my osx folder? I tried using sed, but it complains about a syntax error (plus I'm not sure if it will support this regular expression?). Additionally, will the anchor lines ^ (beginning of line) and $ (end of line) work behind the line or will they correspond to the beginning and end of the file?

UPDATE: some good answers with cut, awk ect that get certain fields from csv, but I recently found out that I need to take the numbers from this list and cut them into 2 sub-values, so my example from the above should look like this:

 "AB101AA",3,94251,8,06376 

As far as I know, for this I need to use a regular expression.

+4
source share
3 answers
 for file in *csv; do cp $file "${file}.bak && \ awk -F "," 'BEGIN OFS=","} {print $1,$11,$12}' ${file}.bak > ${file} done 

or

 sed -i.bak 's/^\("[^"]\+"\),\d\+,"X",-99,-99,-99,-99,-99,-99,-99,\([0-9]\+\),\([0-9]\+\)/\1,\2,\3/' FILE(S) 

eg:

 $ sed 's/^\("[^"]\+"\),[0-9]\+,"X",-99,-99,-99,-99,-99,-99,-99,\([0-9]\+\),\([0-9]\+\).*/\1,\2,\3/' <<EOF "AB101AA",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X" "AB101AF",10,"X",-99,-99,-99,-99,-99,-99,-99,394181,806429,179,"S00","SN9","00","QA","MH","X" "AB101AG",10,"X",-99,-99,-99,-99,-99,-99,-99,394251,806376,179,"S00","SN9","00","QA","MH","X" "AB101AH",10,"X",-99,-99,-99,-99,-99,-99,-99,394371,806359,179,"S00","SN9","00","QA","MH","X" "AB101AJ",10,"X",-99,-99,-99,-99,-99,-99,-99,394171,806398,179,"S00","SN9","00","QA","MH","X" "AB101AL",10,"X",-99,-99,-99,-99,-99,-99,-99,394331,806530,179,"S00","SN9","00","QA","MH","X" EOF "AB101AA",394251,806376 "AB101AF",394181,806429 "AB101AG",394251,806376 "AB101AH",394371,806359 "AB101AJ",394171,806398 "AB101AL",394331,806530 $ 

NTN

+3
source

Do you want to extract fields 1, 11 and 12? For such a task, awk or cut truly superior! For instance.

 awk -F, '{print $1, $11, $12}' input 

using cut :

 cut -d, -f1,11,12 input 

using perl . -a turns on auto-reset mode - perl automatically splits input lines into spaces on the @F array. -F used in conjunction with -a to select the delimiter on which to split lines.

 perl -F, -lane 'printf "%s, %d, %d\n", $F[0], $F[10], $F[11]' input 

... and finally a clean bash solution

 #!/bin/bash IFS=, while read -ra ARRAY; do echo ${ARRAY[0]}, ${ARRAY[10]}, ${ARRAY[11]} done < input 
+5
source
 cd folder for file in $(find . -type f -name '*.csv') do echo $file awk -F"," '{printf("%s,%s,%s\n", $1, $11, $12)}' $file > /tmp/${file}.$$ #awk -F"," '/^(\".+?\"),[0-9]+?,"X",-99,-99,-99,-99,-99,-99,-99,([0-9]+),([0-9]+)/ {printf("%s,%s,%s\n", $1, $11, $12)}' $file > /tmp/${file}.$$ #mv /tmp/${file}.$$ ${file} done 

Comment first awk and uncomment second awk if you need regular exp. Uncomment the latest mv after testing.

0
source

All Articles