Finding the MIME type of a file using PHP is trivial - just use the PEAR MIME_Type package, PHP fileinfo, or call file -i on a Unix machine. This works very well for binary files and all others that have some kind of "magic bytes" through which they are easily detected.
As a result, I discovered the correct type of MIME text files:
- CSS
- Diff
- INI (configuration)
- Javascript
- Rst
- SQL
They are all identified as “text / plain,” which is correct, but too non-specific for me. I need a real type, even if it takes some time to parse the contents of the file.
So my question is: what are the solutions for detecting the MIME type of such text files? Any libraries? Code snippets?
Please note that I do not have a file name or file extension, but I have the contents of the file.
If I used ruby, I could integrate github linguist . Ohloh ohcount is written in C, but has a command line tool to determine the type: ohcount -d $file
What i tried
ohcount
Detects xml and php files correctly, but all the rest do not.
Apache tika
Detects xml and html, all other test files are considered only as text/plain .
cweiske
source share