On Linux / Unix based systems, you can use the file command, but I assume you want to do this manually yourself in the code ...
If all you have access to is a file byte stream, you will need to process each file type independently.
Most programs / components that do what you are interested in usually read the first few bytes and make a classification based on this. For example, GIFs begin with one of the following: GIF87a or GIF89a
Many file formats have the same signature at the beginning of the file or have the same header format. This signature is called a magic number as I described in this post .
A good place to start is to go to www.wotsit.org . It contains file format options that are searchable by file type. You can look at the important types of files that you want to process, and see if you can find a specific factor in these file formats.
You can also search Google to find a library that performs this classification, or look at the source code of a file command.
Brian R. bondy
source share