I am glad to hear that you are doing this in C ++. Nobody seems to see C ++ as “necessary” anymore. All this C # is ASP.NET and that ... Even I work all over the C # house when I swore that I would never switch, since C ++ does everything that I will ever need, and then some . I'm old enough to clear my memory! heh .. In any case, back to the problem ...
DefineDOSDevice() is the method that you use to assign drive letters, port names (LPT1, COM1, etc.). You give it the name, some flags, and the "path" that this device processes. But do not let this fool you. This is not a path to the file system; it is a path of an NT object. I'm sure you saw them as "\ Device \ HardDisk0", etc. You can use WinObj.exe from sysinternals to understand what I mean. In any case, you can create a device driver and then point to it with a link to MSDOS, and you are turned off and running. But on condition that it looks like a lot of work on the original problem.
How many of these mega-gigabyte files are in a typical directory? You may be best off sticking all the files inside one giant file and storing an index file (or a header for each file) next to it that points to the next "file" in your "virtual FileSystem" file.
A good example is viewing the Microsoft MSN Archive format. I canceled this archive format when I worked at an AV company, and it is actually quite creative, but VERY simple. This can be done in one file, and if you want a fantasy, you CAN store the data for 3 files in a configuration such as RAID 5, so if any of the 3 files is launched, you can rebuild the rest. In addition, users simply see 3 VERY large files in the directory and will not be able to access individual (internal) files.
I have provided you with code that decompresses one of these MSN Archive formats. I do not have code that CREATES alone, but from the extraction source, you could build / write one without any problems. If files are deleted and / or often renamed, this can create a problem with the used space in the file, which will need to be trimmed from time to time.
This format even supports CRC fields, so you can check if you have a file in order. I have never been able to completely reverse the algorithm used by Microsoft for CRC data, but I have a good idea.
You cannot support the current I / O procedures, that is, CreateFile () cannot simply open any file in the archive, however, with uber-coolness C ++ you can override the CreateFile call to implement your archive format.
If you need help, and this is a big enough problem, maybe we could talk offline and find a solution for you.
I do not mind writing you a FileSystemDriver, but for this we will have to start talking about compensation. I would be more than happy to give you direction and ideas for free, as now.
I'm not sure kosher for me to give you my email address here, I'm not sure about SO policy on this, since we could talk about potential work / invitation, but this is not my only intention, I would rather help you find your own solutions.
Before you look into the device driver, download WinDDK. It has driver samples.
If you are wondering why it cost me so much, this is due to the fact that for many years I had to write a driver similar to this one, which was supposed to be compatible with Windows and OSX, which would allow users to protect the volume drive (USB- keys, removable volumes) WITHOUT installing any drivers or complex (and bulky, sometimes annoying) software. In recent years, many equipment manufacturers have done similar things, but I don’t think that safety is all safe. I look at using RSA and AES, just like GPG and PGP. Initially, I was tied up about this, for which (I believe, but have no evidence) they were going to use to protect MP3 files. Since they will be stored in an encrypted format, they simply will not work without the correct passphrase. But I have seen other use cases. (This was again when a 16 MB (yes MEG) USB dongle cost more than $ 100 or so).
This project also paired with my PC security system in the oil and gas industry, which used something similar to smart cards, much easier to use, reuse / re-issue, impossible (read: VERY difficult and unlikely) to crack, and I could use it with your children at home! (Since there is always a struggle for who gets time on the computer, and who gets the most, and further, and further, and further and ...)
Fu ... I think I’ve gotten away from the topic here. In any case, here is an example of the Microsoft MSN archive format. See if you can use something like this, knowing that you can always “skip” the right to a file by performing offsets in the file when analyzing / searching for the requested file in the main file; or in pre-analyzed data stored in memory. And since you are not loading raw binary data into memory, your only limit is likely to be the limitation of the 4gb file on 32-bit machines.
The MARC format (Microsoft MSN Archive) is placed (freely) as follows:
- 12 byte header (only one)
- File magic
- MARC version
- Number of files (in the following table)
- 68 byte files Header table (1 to .NumFiles header of them)
- File name
- file size
- Check sum
- offset to raw file data
Now, records in a table of 12 bytes bytes use 32 bits for file length and offset. For your VERY large files, you may need up to 48 or 64 bits of integers.
Here is the code I wrote for processing.
#define MARC_FILE_MAGIC 0x4352414D // In Little Endian #define MARC_FILENAME_LEN 56 //(You'll notice this is rather small) #define MARC_HEADER_SIZE 12 #define MARC_FILE_ENT_SIZE 68 #define MARC_DATA_SIZE 1024 * 128 // 128k Read Buffer should be enough. #define MARC_ERR_OK 0 // No error #define MARC_ERR_OOD 314 // Out of data error #define MARC_ERR_OS 315 // Error returned by the OS #define MARC_ERR_CRC 316 // CRC error struct marc_file_hdr { ULONG h_magic; ULONG h_version; ULONG h_files; int h_fd; struct marc_dir *h_dir; }; struct marc_file { char f_filename[MARC_FILENAME_LEN]; long f_filesize; unsigned long f_checksum; long f_offset; }; struct marc_dir { struct marc_file *dir_file; ULONG dir_filenum; struct marc_dir *dir_next; };
This gives you an idea of the headers I wrote for them, and here is an open function. Yes, it misses all support calls, err procedures, etc., but you get this idea. Please excuse the style code for C and C ++ code. Our scanner posed many different problems, such as ... I used antique calls such as open (), fopen () to maintain standards with the rest of the code base.
struct marc_file_hdr *marc_open(char *filename) { struct marc_file_hdr *fhdr = (struct marc_file_hdr*)malloc(sizeof(marc_file_hdr)); fhdr->h_dir = NULL; #if defined(_sopen_s) int errno = _sopen_s(fhdr->h_fd, filename, _O_BINARY | _O_RDONLY, _SH_DENYWR, _S_IREAD | _S_IWRITE); #else fhdr->h_fd = open(filename, _O_BINARY | _O_RDONLY); #endif if(fhdr->h_fd < 0) { marc_close(fhdr); return NULL; } //Once we have the file open, read all the file headers, and populate our main headers linked list. if(read(fhdr->h_fd, fhdr, MARC_HEADER_SIZE) != MARC_HEADER_SIZE) { errmsg("MARC: Could not read MARC header from file %s.\n", filename); marc_close(fhdr); return NULL; } // Verify the file magic if(fhdr->h_magic != MARC_FILE_MAGIC) { errmsg("MARC: Incorrect file magic %x found in MARC file.", fhdr->h_magic); marc_close(fhdr); return NULL; } if(fhdr->h_files <= 0) { errmsg("MARC: No files found in archive.\n"); marc_close(fhdr); return NULL; } // Get all the file headers from this archive, and link them to the main header. struct marc_dir *lastdir = NULL, *curdir = NULL; curdir = (struct marc_dir*)malloc(sizeof(marc_dir)); fhdr->h_dir = curdir; for(int x = 0;x < fhdr->h_files;x++) { if(lastdir) { lastdir->dir_next = (struct marc_dir*)malloc(sizeof(marc_dir)); lastdir->dir_next->dir_next = NULL; curdir = lastdir->dir_next; } curdir->dir_file = (struct marc_file*)malloc(sizeof(marc_file)); curdir->dir_filenum = x + 1; if(read(fhdr->h_fd, curdir->dir_file, MARC_FILE_ENT_SIZE) != MARC_FILE_ENT_SIZE) { errmsg("MARC: Could not read file header for file %d\n", x); marc_close(fhdr); return NULL; } // LEF: Just a little extra insurance... curdir->dir_file->f_filename[MARC_FILENAME_LEN] = NULL; lastdir = curdir; } lastdir->dir_next = NULL; return fhdr; }
Then you have a simple extraction method. Keep in mind that this was strictly for virus scanning, therefore there are no search routines, etc. This was intended to simply upload the file, scan it and continue. Below is the CRC code program that I BELIEVE to use Microsoft, but I'm not sure what exactly they are CRC'ed. It may include header data + file data, etc. I simply did not care to go back and try to cancel it. In any case, as you can see, there is no compression in this archive, but it is very easy to add. If you want, you can provide a complete source. (I think that all that remains is the close () procedure and the code that calls and retrieves each file, etc. !!)
bool marc_extract(struct marc_file_hdr *marc, struct marc_file *marcfile, char *file, int &err) {
Here is my supposed CRC procedure (I might have stolen it from Stuart Kay and libmspack , I can't remember):
static unsigned long marc_checksum(void *pv, UINT cb, unsigned long seed) { int count = cb / 4; unsigned long csum = seed; BYTE *p = (BYTE*)pv; unsigned long ul; while(count-- > 0) { ul = *p++; ul |= (((unsigned long)(*p++)) << 8); ul |= (((unsigned long)(*p++)) << 16); ul |= (((unsigned long)(*p++)) << 24); csum ^= ul; } ul = 0; switch(cb % 4) { case 3: ul |= (((unsigned long)(*p++)) << 16); case 2: ul |= (((unsigned long)(*p++)) << 8); case 1: ul |= *p++; default: break; } csum ^= ul; return csum; }
Well, I think this post is long enough ... Contact me if you need help or have any questions.