How to remove the first column (which is actually row names) from a data file in Linux?

I have a data file with many thousands of columns and rows. I want to remove the first column, which is actually a row counter. I used this command in linux:

cut -d " " -f 2- input.txt > output.txt 

but nothing has changed in my release. Does anyone know why this is not working and what should I do?

This is what my input file looks like:

 col1 col2 col3 col4 ... 1 0 0 0 1 2 0 1 0 1 3 0 1 0 0 4 0 0 0 0 5 0 1 1 1 6 1 1 1 0 7 1 0 0 0 8 0 0 0 0 9 1 0 0 0 10 1 1 1 1 11 0 0 0 1 . . . 

I want my result to look like this:

 col1 col2 col3 col4 ... 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 . . . 

I also tried the sed command:

  sed '1d' input.file > output.file 

But it deletes the first row, not the first column.

Can anyone visit me?

+16
source share
5 answers

@Karafka I had CSV files, so I added the separator "," (you can replace it with your

 cut -d"," -f2- input.csv > output.csv 

Then I used a loop to iterate over all the files inside the directory

 # files are in the directory tmp/ for f in tmp/* do name=`basename $f` echo "processing file : $name" #kepp all column excep the first one of each csv file cut -d"," -f2- $f > new/$name #files using the same names are stored in directory new/ done 
+6
source

The idiomatic use of the cut will be

 cut -f2- input > output 

if the separator is a tab ("\ t").

Or just with awk magic (will work for both space and tab delimiter)

  awk '{$1=""}1' input | awk '{$1=$1}1' > output 

the first awk removes field 1 but leaves the delimiter; the second awk removes the delimiter. The default output separator will be space, if you want to go to the tab, add -vOFS="\t" to the second awk.

UPDATED

Based on your updated input, the problem is that leading spaces are treated as multiple columns. One way to handle this is to remove them first before serving.

 sed 's/^ *//' input | cut -d" " -f2- > output 

or use the awk alternative, which will work in this case too.

+15
source

You can use the cut with the --complement option:

 cut -f1 -d" " --complement input.file > output.file 

This will output all the columns except the first.

+13
source

As @karakfa notes, this looks like a leading space that is causing your problems.

Here is sed oneliner to do the job (which will take into account spaces or tabs):

 sed -i.bak "s|^[ \t]\+[0-9]\+[ \t]\+||" input.txt 

Explanation:

 -i edit existing file in place .bak backup original file and add .bak file extension (can use whatever you like) s substitute | separator (easiest character to read as sed separator IMO) ^ start match at start of the line [ \t] match space or tab \+ match one or more times (escape required so sed does not interpret '+' literally) [0-9] match any number 0 - 9 

As noted; The input.txt file will be edited in place. The original contents of input.txt will be saved as input.txt.bak . Instead, use only -i if you do not want to back up the source file.

In addition, if you know that they are certainly leading spaces (not tabs), you can shorten it to this:

 sed -i.bak "s|^ \+[0-9]\+[ \t]\+||" input.txt 
0
source

You can also achieve this with grep:

 grep -E -o '[[:digit:]]([[:space:]][[:digit:]]){3}$' input.txt 

Which involves single-character numbers and spaces. To place a variable number of spaces and numbers, you can do:

 grep -E -o '[[:digit:]]+([[:space:]]+[[:digit:]]+){3}$' input.txt 

If your grep supports the -P flag ( --perl-regexp ), you can do:

 grep -P -o '\d+(\s+\d+){3}$' input.txt 

Here are a few options if you are using GNU sed:

 sed 's/^\s\+\w\+\s\+//' input.txt sed 's/^\s\+\S\+\s\+//' input.txt sed 's/^\s\+[0-9]\+\s\+//' input.txt sed 's/^\s\+[[:digit:]]\+\s\+//' input.txt 

Note that grep regular expressions correspond to the parts that we want to keep, while sed regular expressions correspond to the parts that we want to remove.

0
source

All Articles