Unix cut: prints the same field twice

Say I have a file - a.csv

ram,33,professional,doc shaym,23,salaried,eng 

Now I need this conclusion (please don't ask me why)

 ram,doc,doc, shayam,eng,eng, 

I use the cut command

 cut -d',' -f1,4,4 a.csv 

But the solution remains

 ram,doc shyam,eng 

This means that a slice can print a field only once . I need to print the same field twice or n times. Why do I need it? (It’s not necessary to read) This is a long story. I have such a file

 #,#,-,- #,#,#,#,#,#,#,- #,#,#,- 

I have to hide it

 #,#,-,-,-,-,- #,#,#,#,#,#,#,- #,#,#,-,-,-,- 

Here, each "#" and "-" refers to different numerical data. Thank you

+9
source share
5 answers

You cannot print the same field twice. cut organizes the selection of fields (or characters or bytes). See Combining two different cut outs in one command? and Reordering / characters with the cut command for some similar queries.

The correct tool to use here is awk if your CSV does not have quotes around the fields.

 awk -F , -v OFS=, '{print $1, $4, $4}' 

If you don't want to use awk (why? What strange system has cut and sed , but not awk ?), You can use sed (still assuming your CSV doesn't have quotes around the field). Match the first four fields, separated by commas, and select the fields you need in the desired order.

 sed -e 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/\1,\4,\4/' 
+8
source
 $ sed 's/,.*,/,/; s/\(,.*\)/\1\1,/' a.csv ram,doc,doc, shaym,eng,eng, 

What does it do:

  • Replace everything between the first and last commas with only a comma
  • Repeat the last part of "something" and click on the comma. Voila!

Assumptions:

  • You need the first field, then twice the last field
  • No spaces in the first and last field

Why do you need this conclusion? :-)

+1
source

As others have noted, cut does not support field repetition.

You can combine cut and sed , for example, if the repeated element is at the end:

 < a.csv cut -d, -f1,4 | sed 's/,[^,]*$/&&,/' 

Conclusion:

 ram,doc,doc, shaym,eng,eng, 

Edit

To make a repeated variable, you can do something like this (assuming you have coreutils):

 n=10 rep=$(seq $n | sed 's:.*:\&:' | tr -d '\n') < a.csv cut -d, -f1,4 | sed 's/,[^,]*$/'"$rep"',/' 

Conclusion:

 ram,doc,doc,doc,doc,doc,doc,doc,doc,doc,doc, shaym,eng,eng,eng,eng,eng,eng,eng,eng,eng,eng, 
+1
source

using perl:

 perl -F, -ane 'chomp($F[3]);$a=$F[0].",".$F[3].",".$F[3];print $a."\n"' your_file 

using sed:

 sed 's/\([^,]*\),.*,\(.*\)/\1,\2,\2/g' your_file 
+1
source

I had the same problem, but instead of adding all the columns to awk, I just used (to duplicate the 2nd column):

awk -v OFS='\t' '$2=$2"\t"$2' # for tab-delimited files

For CSV you can just use

awk -F , -v OFS=, '$2=$2","$2'

0
source

Source: https://habr.com/ru/post/925304/


All Articles