Linux team: how to "find" only text files?

After a few searches from Google, I came up with the following:

find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text 

which is very inconvenient and displays unnecessary texts, such as information like mime. Any better solutions? I have many images and other binaries in one folder with a lot of text files that I need to execute.

+61
linux find search
Jan 22 '11 at 10:55
source share
14 answers

I know this is an old thread, but I came across it and thought that I would share my method, which, as I found, is a very quick way to use find to search for non-binary files only:

 find . -type f -exec grep -Iq . {} \; -and -print 

The -I option for grep tells it to immediately ignore the binaries, as an option . along with -q will force it to immediately match text files so that it runs very quickly. You can change -print to -print0 for piping in xargs -0 or something if you are worried about spaces (thanks for the tip, @ lucas.werkmeister!)

Also, the first point is needed only for some versions of BSD find , for example, for OS X, but it will not harm anything that would be there all the time if you want to put this in an alias or something else.

+106
Dec 01
source share

Why is this inconvenient? If you need to use it often and don’t want to enter it every time, just define a bash function for it:

 function findTextInAsciiFiles { # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text } 

put it in your .bashrc and then just run:

 findTextInAsciiFiles your_folder "needle text" 

whenever you want.




EDIT to reflect OP editing:

If you want to cut out mime information, you can simply add one more step to the pipeline, which filters out the mime information. This should do the trick, taking only what precedes:: cut -d':' -f1 :

 function findTextInAsciiFiles { # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1 } 
+10
Jan 22 '11 at 11:09
source share

Based on this SO question :

grep -rIl "needle text" my_folder

+6
Jul 06 2018-12-12T00:
source share
 find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search" 

This, unfortunately, is not space conservation. Putting this in a bash script makes it a little easier.

It's safe:

 #!/bin/bash #if [ ! "$1" ] ; then echo "Usage: $0 <search>"; exit fi find . -type f -print0 \ | xargs -0 file \ | grep -P text \ | cut -d: -f1 \ | xargs -i% grep -Pil "$1" "%" 
+4
Jan 22 2018-11-11T00:
source share

How about this:

 $ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' 

If you need file names without file types, just add the ultimate sed filter.

 $ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||' 

You can filter out unnecessary file types by adding additional -e 'type' options to the last grep .

EDIT:

If your version of xargs supports the -d , the commands above are simplified:

 $ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||' 
+2
Jan 22 '11 at 11:15
source share

Here is how I did it ...

1. make a small script to check if the file is ISTEXT plain text:

 #!/bin/bash [[ "$(file -bi $1)" == *"file"* ]] 

2. still use search

 find . -type f -exec istext {} \; -exec grep -nHi mystring {} \; 
+2
Mar 16 2018-12-12T00:
source share

I have two problems with the histogram response:

  • It displays only text files. In fact, they are not looking for them as requested. To search, use

     find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text" 
  • It starts a grep process for each file, which is very slow. Best solution then

     find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text" 

    It takes only 0.2 s compared to 4s for the solution above (2.5 GB of data / 7700 files), i.e. 20 times faster .

In addition, no one has quoted ag, Silver Searcher or ack-grep Alternatives. If one of them is available, these are much better alternatives:

 ag -t "needle text" # Much faster than ack ack -t "needle text" # or ack-grep 

As a final note, beware of false positives (binary files made as text files). I already had a false positive result using grep / ag / ack, so it’s better to map the mapped files first before editing the files.

+1
Feb 03 '16 at 17:55
source share

Although this is an old question, I think the information below will add to the quality of the answers here.

When ignoring files with an executable bit , I just use this command:

 find . ! -perm -111 

So that he cannot recursively enter other directories:

 find . -maxdepth 1 ! -perm -111 

Channels do not need to mix many commands, only a powerful search command.

  • Disclaimer: this is not exactly what the OP asked because it does not check if the file is binary or not. It, for example, filters out bash script files that are texts themselves but have an executable bit .

However, I hope this is useful to everyone.

+1
Apr 15 '17 at 1:41 on
source share

Another way to do this:

 # find . |xargs file {} \; |grep "ASCII text" 

If you need empty files too:

 # find . |xargs file {} \; |egrep "ASCII text|empty" 
+1
Nov 03. '17 at 21:43
source share

I do this: 1), since there are too many files (~ 30k) for searching, I daily generate a list of text files for use through crontab using the following command:

 find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list & 

2) create a function in .bashrc:

 findex() { cat ~/.src_list | xargs grep "$*" 2>/dev/null } 

Then I can use the command below to do a search:

 findex "needle text" 

NTN :)

0
Dec 26
source share

I prefer xargs

 find . -type f | xargs grep -I "needle text" 

if your file names look weird using the -0 options:

 find . -type f -print0 | xargs -0 grep -I "needle text" 
0
Nov 04 '14 at 15:49
source share
  • bash example for the text "eth0" in the / etc file in all text / ascii files

grep eth0 $ (find / etc / -type f -exec file {} \; | egrep -i "text | ascii" | cut -d ':' -f1)

0
Apr 01 '16 at 14:49
source share

Here is a simplified version with an extended explanation for beginners like me who are trying to learn how to add more than one command on a single line.

If you were to write out the problem step by step, it would look like this:

 // For every file in this directory // Check the filetype // If it an ASCII file, then print out the filename 

For this we can use three UNIX commands: find , file and grep .

find checks every file in the directory.

file will give us the file type. In our case, we are looking for ASCII text return

grep will search for the keyword "ASCII" in the output of file

So how can we combine them into one line? There are several ways to do this, but I believe that doing it in the order of our pseudo-code makes sense (especially for a beginner like me).

find ./ -exec file {} ";" | grep 'ASCII'

It looks complicated, but not bad when we break it:

find ./ = browse all files in this directory. The find displays the file name of any file that matches the "expression", or whatever comes after the path, which in our case is the current directory or ./

The most important thing to understand is that everything after the first bit will be evaluated as True or False. If True, the file name will be printed. If not, the command proceeds.

-exec = this flag is an option in the find command, which allows us to use the result of some other command as a search expression. It is like calling a function inside a function.

file {} = command called inside find . The file command returns a string indicating the type of file. Regularly, it will look like this: file mytextfile.txt . In our case, we want it to use any file that is searched by the find , so we insert curly braces {} as an empty variable or parameter. In other words, we simply ask the system to print a line for each file in the directory.

";" = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for "find" for details if you need it by running man find .

| grep 'ASCII' | grep 'ASCII' = | - this is a pipe. The pipe will output the output from what's on the left and use it as an input to what's on the right. It takes the output of the find (a string that is a file type for a single file) and checks if it contains the string 'ASCII' . If so, it returns true.

NOW, the expression to the right of find ./ will return true when the grep returns true. Voila.

0
Dec 06 '16 at 22:28
source share

How about this

  find . -type f|xargs grep "needle text" 
-3
Jan 22 '11 at 11:09
source share



All Articles