Extract the first three columns from all tsv files in the folder

Question

Extract the first three columns from all tsv files in the folder

I have several tsv files in a folder totaling over 50 GB in total. To make it easier to work with memory when loading these files into R, I want to extract only the first three columns of these files.

How can all files be extracted immediately after output to the terminal? I am running Ubuntu 16.04.

+6

linux bash csv

Keshav m Feb 08 '18 at 11:59

source share

5 answers

ngj · Answer 1 · 2018-02-08T12:08:07+0000

Something like the following should work:

#!/bin/bash
FILES=/path/to/*
for f in $FILES
do
    # Do something for each file. In our case, just echo the first three fields:
    cut -f1-3 < "$f"
done

(see this web page for more information on iterating files in bash.)

. , find. , , , (, , ).

:. , - script:

#!/bin/bash
FILES=/path/to/*
for f in $FILES
do
    # Do something for each file. In our case, echo the first three fields to a new file, and rename the new file to the original file:
    cut -f1-3 < "$f" > "$f.tmp"
    rm "$f"
    mv "$f.tmp" "$f"
done

cut .tmp; .

Tobias Ribizel · Answer 2 · 2018-02-08T12:09:38+0000

cut

:

cut -d$"\t" -f 1-3 folder/*

-d ( ), -f folder/* glob, , .

John Zwinck · Answer 3 · 2018-02-08T12:10:18+0000

R - , :

fread("foo.tsv", sep = "\t", select=c("f1", "f2", "f3"))

M. Becerra · Answer 4 · 2018-02-08T12:07:21+0000

:

find ./ -type f -name ".tsv" -exec awk '{ print $1,$2,$3 }' {} \;

, , .

, , awk:

find ./ -type f -name ".tsv" -exec awk '{ print $1,$2,$3 }' {} >> someOtherFile \;

JonDeg · Answer 5 · 2018-02-09T04:45:05+0000

R, , . .

( ) ( data.frame):

> df1 = read.table(pipe("cut -f 1-3 *.tsv"), sep="\t", header=FALSE, quote="")

tidyverse/readr ( tibble):

> df2 = read_tsv(pipe("cut -f 1-3 *.tsv"))

data.table ( a data.table , , a data.frame):

> df3 = fread("cut -f 1-3 *.tsv")

unix shell, . . , . , 10 000 :

> df4 = fread("cut -f 1,3 *.tsv | shuf -n 10000")

.

Extract the first three columns from all tsv files in the folder

More articles: