How to select random files from a directory in bash?

Question

How to select random files from a directory in bash?

I have a directory with approximately 2000 files. How to choose a random selection of N files using either a bash script or a list of commands with channels?

+105

bash random

Marlo Guthrie Jan 05 '09 at 19:15

source share

12 answers

You can use shuf (from the GNU coreutils package). Just give it a list of file names and ask it to return the first line from an arbitrary permutation:

 ls dirname | shuf -n 1 # probably faster and more flexible: find dirname -type f | shuf -n 1 # etc..

Adjust the value of -n, --head-count=COUNT to return the number of rows required. For example, to return 5 random file names that you would use:

 find dirname -type f | shuf -n 5

+79

Nordic Mainframe Sep 04 '13 at 14:59

source share

Here are a few features that do not parse ls output and are 100% safe regarding files with spaces and funny characters in their name. They all populate the randf array randf list of random files. This array prints easily when printf '%s\n' "${randf[@]}" if necessary.

This file may produce the same file several times, and N must be known in advance. Here I chose N = 42.
```
 a=( * ) randf=( "${a[RANDOM%${#a[@]}]"{1..42}"}" ) 
```
This feature is not well documented.
If N is not known in advance, but you really liked the previous opportunity, you can use eval . But this is evil, and you must really make sure that N does not come directly from user input without careful monitoring!
```
 N=42 a=( * ) eval randf=( \"\${a[RANDOM%\${#a[@]}]\"\{1..$N\}\"}\" ) 
```
I personally don't like eval and therefore this answer!

Same thing using a simpler method (loop):

 N=42 a=( * ) randf=() for((i=0;i<N;++i)); do randf+=( "${a[RANDOM%${#a[@]}]}" ) done

If you do not want to have the same file several times:

 N=42 a=( * ) randf=() for((i=0;i<N && ${#a[@]};++i)); do ((j=RANDOM%${#a[@]})) randf+=( "${a[j]}" ) a=( "${a[@]:0:j}" "${a[@]:j+1}" ) done

Note. This is a late reply to an old post, but the accepted answer refers to an external page that shows a scary bash , and the other answer is not much better, since it also parses the output of ls . Commentary on the accepted answer points to Lhunath's excellent answer, which obviously shows good practice, but certainly does not answer the OP.

+17

gniourf_gniourf Jul 01 '13 at 18:08

source share

A simple solution to select 5 random files while avoiding ls parsing . It also works with files containing spaces, newlines, and other special characters:

 shuf -ezn 5 * | xargs -0 -n1 echo

Replace echo command you want to execute for your files.

+7

scai Aug 30 '17 at 8:16

source share

 ls | shuf -n 10 # ten random files

+7

silgon Sep 15 '17 at 7:55 on

source share

If you have Python installed (works with either Python 2 or Python 3):

To select a single file (or line from an arbitrary command), use

 ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"

To select N files / lines use (note N is at the end of the command, replace this with a number)

 ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N

+4

Mark Nov 03 '15 at 21:14

source share

This is an even later answer to the late @gniourf_gniourf answer that I just supported, because it is by far the best answer, twice. (Once to avoid eval and once to safely process file names.)

But it took me a few minutes to unravel the "not very well-documented" functions (s) that this answer uses. If your Bash skills are strong enough that you immediately saw how it works, skip this comment. But I did not, and, unraveling it, I think it is worth explaining.

Function # 1 is the copied file. a=(*) creates an array of $a whose members are the files in the current directory. Bash understands all the oddities of file names, so the list is guaranteed to be correct, guaranteed to be escaped, etc. No need to worry about parsing textual file names returned by ls correctly.

Feature # 2 - Bash parameter extensions for arrays , one nested in the other. This starts with ${#ARRAY[@]} , which expands to the length of $ARRAY .

This extension is then used to index the array. The standard way to find a random number between 1 and N is to take a random number modulo N. We need a random number between 0 and the length of our array. Here's a two-line approach for clarity:

 LENGTH=${#ARRAY[@]} RANDOM=${a[RANDOM%$LENGTH]}

But this solution does it on one line, removing unnecessary variable assignment.

Function # 3 is a Bash bracket extension , although I must admit that I do not quite understand This. For example, to expand the list, 25 files are used with the name filename1.txt , filename2.txt , etc.: echo "filename"{1..25}".txt" .

The expression inside the subshell above, "${a[RANDOM%${#a[@]}]"{1..42}"}" , uses this trick to create 42 separate extensions. An extension of the bracket places one digit between ] and } , which, at first, I thought, signed the array, but if so, it will be preceded by a colon. (He would also return 42 consecutive elements from a random spot in the array, which is not at all the same as returning 42 random elements from an array.) I think this just makes the shell run the extension 42 times, thereby returning 42 random elements from array. (But if someone can explain it more fully, I would like to hear it.)

The reason N must be hard-coded (up to 42) is because the extensions expand before the variable expands.

Finally, here is Function # 4 , if you want to do this recursively for a directory hierarchy:

 shopt -s globstar a=( ** )

This includes the shell option , which causes ** to match recursively. Now your $a array contains all the files in the entire hierarchy.

+4

Ken Aug 01 '16 at 23:49

source share

This is the only script I can play with bash on MacOS. I combined and edited snippets from the following two links:

ls command: how can I get a recursive list of the full path, one line per file?

http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/

 #!/bin/bash # Reads a given directory and picks a random file. # The directory you want to use. You could use "$1" instead if you # wanted to parametrize it. DIR="/path/to/" # DIR="$1" # Internal Field Separator set to newline, so file names with # spaces do not break our script. IFS=' ' if [[ -d "${DIR}" ]] then # Runs ls on the given dir, and dumps the output into a matrix, # it uses the new lines character as a field delimiter, as explained above. # file_matrix=($(ls -LR "${DIR}")) file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }')) num_files=${#file_matrix[*]} # This is the command you want to run on a random file. # Change "ls -l" by anything you want, it just an example. ls -l "${file_matrix[$((RANDOM%num_files))]}" fi exit 0

+1

benmarbles Oct 02 '14 at 12:28

source share

There are no sort -R and shuf commands on MacOS, so I only needed a bash solution that randomized all files without duplicates and could not find it here. This solution is similar to gniourf_gniourf # 4, but hopefully adds better comments.

The script should be easily modified to stop after N samples using a counter with if or gniourf_gniourf for a loop with N. $ RANDOM is limited to ~ 32000 files, but this should be done in most cases.

 #!/bin/bash array=(*) # this is the array of files to shuffle # echo ${array[@]} for dummy in "${array[@]}"; do # do loop length(array) times; once for each file length=${#array[@]} randomi=$(( $RANDOM % $length )) # select a random index filename=${array[$randomi]} echo "Processing: '$filename'" # do something with the file unset -v "array[$randomi]" # set the element at index $randomi to NULL array=("${array[@]}") # remove NULL elements introduced by unset; copy array done

+1

cat Dec 17 '17 at 11:13

source share

I use this: it uses a temporary file, but penetrates deep into the directory until it finds a regular file and returns it.

 # find for a quasi-random file in a directory tree: # directory to start search from: ROOT="/"; tmp=/tmp/mytempfile TARGET="$ROOT" FILE=""; n= r= while [ -e "$TARGET" ]; do TARGET="$(readlink -f "${TARGET}/$FILE")" ; if [ -d "$TARGET" ]; then ls -1 "$TARGET" 2> /dev/null > $tmp || break; n=$(cat $tmp | wc -l); if [ $n != 0 ]; then FILE=$(shuf -n 1 $tmp) # or if you dont have/want to use shuf: # r=$(($RANDOM % $n)) ; # FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1); fi ; else if [ -f "$TARGET" ] ; then rm -f $tmp echo $TARGET break; else # is not a regular file, restart: TARGET="$ROOT" FILE="" fi fi done;

0

bzimage Apr 28 '15 at 12:04

source share

If you have more files in the folder, you can use the command below, which I found on unix stackexchange .

 find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/

Here I wanted to copy files, but if you want to move files or do something else, just change the last command in which I used cp .

0

Bhaskar Chakradhar Mar 28 '19 at 12:45

source share

How about a Perl solution slightly prepared by Mr. Kang here:
How to shuffle lines of a text file on a Unix command line or in a shell script?

$ ls | perl -MList :: Util = shuffle -e '@lines = shuffle (<>); Print @ lines [0..4] '

-one

AAAfarmclub Jun 06 '17 at 2:11

source share

Josh Lee · Accepted Answer · 2009-01-05 20:01

Here is a script that uses the GNU random sort option:

 ls |sort -R |tail -$N |while read file; do # Something involving $file, or you can leave # off the while to just get the filenames done

How to select random files from a directory in bash?

More articles: