Here is a link to the ForensicsWiki, which describes many different types of files. It describes the headers of DOC and DOCX files, so you should be able to parse the files and determine what they are.
Looking at the link, the .doc files are OLE Compound Files, the file should have the following binary header:
d0 cf 11 e0 a1 b1 1a e1
In constrast, .docx files will have a binary signature:
50 4b
samoz
source share