How reliable is test -B?

When I open the SQLite database file, there is a lot of readable text at the beginning of the file - how likely is it that the SQLite file will be filtered incorrectly due to the -B file check?

 #!/usr/bin/env perl use warnings; use strict; use 5.10.1; use File::Find; my $dir = shift; my $databases; find( { wanted => sub { my $file = $File::Find::name; return if not -B $file; return if not -s $file; return if not -r $file; say $file; open my $fh, '<', $file or die "$file: $!"; my $firstline = readline( $fh ) // ''; close $fh or die $!; push @$databases, $file if $firstline =~ /\ASQLite\sformat/; }, no_chdir => 1, }, $dir ); say scalar @$databases; 
+6
source share
2 answers

The perlfunc man page has the following words: -T and -B :

 The -T and -B switches work as follows. The first block or so of the file is examined for odd characters such as strange control codes or characters with the high bit set. If too many strange characters (>30%) are found, it a -B file; otherwise it a -T file. Also, any file containing a zero byte in the first block is considered a binary file. 

Of course, now you can do a statistical analysis of several sqlite files, analyze their "first block or so" for "odd characters", calculate the probability of their occurrence, and this will give you an idea of ​​how likely this means that -B does not work for sqlite files.

However, you can also easily go the route. Could it fail? Yes, it is a heuristic. And bad at that. Therefore, do not use it.

Unix file type recognition is usually done by evaluating the contents of the file. And yes, there are people who have already done all the work for you: it is called libmagic (the thing that gives the file command line tool). You can use it with Perl, for example. File :: MMagic .

+7
source

Well, all files are technically a set of bytes and therefore binary. In addition, there is no universally accepted definition of binary code, so it is not possible to evaluate the reliability of -B unless you want to establish the definition with which it should be evaluated.

+1
source

All Articles