Is it possible to intercept matlab save () bytestream

In Matlab, you can write matlab objects or even the entire workspace to a file using the matlab save () call. I would like to intercept bytestream and postprocess before it gets to the file, is this possible? Alternatively, you can tell filedescriptor that the byte stream is written instead of the file name, which usually goes into the save () call as an argument.

Note that I am not looking for an alternative way to write the file to matlab, I know that I can fopen () the file and write whatever I want, but the fact is that I want to (re) use a serialization object that is internal to the save call rather than reinventing my own again.

Of course, an analogue question arises for calling load (), but in this case, the byte is intercepted before it enters the deserialization process, but I think that if you can save () the solution for loading (), the problem will follow naturally.

A few clarifications:

  • I don’t look at a new way to serialize Matlab data, it already exists, and the whole point of the exercise is to use the existing serialization in the save () call, so 1) I don’t need to start updating the serialization code for new types of objects in new versions of Matlab, or God forbid that people start using custom OOP objects, and 2) I can still easily use existing code for reading in mat files, such as, for example, scipy support for mat files.

  • The stream should not go to a file or anything before the start of processing, the idea of ​​encryption for security, writing the stream to a file completely undermines this goal.

Complications:

  • It seems that the functionality used in the save function in matlab is not just regular sequential writing. Studying the library object code, it seems that the save function is implemented using matPutVariable (formerly called matPutArray ), which writes the given variable of type mxArray* out to a file of type MATFile* opened with matOpen problem is the following text in the description of matPutVariable :

    If mxArray does not exist in the MAT file, the function adds it to the end. If mxArray with the same name exists in the file, the function replaces the existing mxArray with the new mxArray by overwriting the file.

    This means that the matPutVariable function matPutVariable have to be searched through the file, it is obvious that the search will not be possible using pipes, so using these channels to implement our byte stream processing is not possible using this existing serialization function.

+6
serialization matlab save
source share
10 answers

After thinking about this for several months, I will say no, this is impossible. At least not without hardcore non-portable binary / ELF hacking.

0
source share

How to use a virtual file system? Windows has a commercial library called BoxedAPP SDK that allows you to create a virtual file that is available only for the creation process (possibly for children). You will probably have to make MEX for the library interface. First you create a virtual file, and then you can use the save command in matlab with the same file name. Then you can read the serialized .mat stream using the normal fopen / fread functions in matlab and do whatever you want. This will at least prevent the creation of a file on the hard drive. I'm not sure if a file or parts of it can get into the page file in some situation, since the file is actually created in memory.

Libmx also has undocumented functions mxSerialize and mxDeserialize, which you could use, for example. using the loadlibrary / calllib function directly from matlab or using a wrapper mixer. Googling showed a little that the signature for these functions should be

 mxArray* mxSerialize(const mxArray*); mxArray* mxDeserialize(const void*, size_t); 

and some tests showed that mxSerialize () takes the matlab variable as an argument and returns serialized bytes as a uint8 array. MxDeserialize () converts this uint8 array (1st argument) back to the matlab object as a return value. The second argument to mxDeserialize is the number of elements in the first argument. The use of these undocumented functions is not guaranteed, however, in the future, because TMW may change the API.

+5
source share

EDIT: (based on comments) Hmm, probably my old answer doesn't help much. I don’t know how you are going to intercept the byte stream, but I suppose you have one option (which is admittedly a bit kludge), just allow SAVE , create the file, then read the data from the file byte-by-bye immediately, process it and write to a file. Something like:

 save('workspace.mat'); fid = fopen('workspace.mat','r'); byteData = fread(fid,inf,'*uint8'); fclose(fid); %# ... Process byteData here ... fid = fopen('workspace.mat','w'); fwrite(fid,byteData,'uint8'); fclose(fid); 

Old answer:

For custom class objects, I believe that what you are looking for is implemented in the overloaded SAVEOBJ and LOADOBJ , which are called on the object before saving or loading from the file. When saving or loading objects in or out. MAT files, you can use these methods to change the save / load process so that objects can be formatted in different ways. However, I do not think that you can do this for inline data types, only for custom objects.

+2
source share

For HG objects, you can intercept save processing through internal (mutable) * .m files, which are explained here: http://undocumentedmatlab.com/blog/handle2struct-struct2handle-and-matlab-8/

+2
source share

It is probably best to write the mat file on tmpfs / ramdisk and then encrypt it before saving it to disk. You sacrifice portability and rely on the OS to provide secure virtual memory, but if you can't even trust the local drive, you probably won't be able to achieve satisfactory security.

By the way, why are you unable to trust the local disk at all, even if you cannot put your temporary file in a directory with permissions allowed only for access by the user who owns the matlab (and root) process? Are you trying to implement a DRM system?

+2
source share

Could you encrypt the contents of the variables?

With whos you get a list of all your variables in alphabetical order. For each of them, you create a mask of the same size using your encryption algorithm, and you yourself replace the "true" XOR value with the mask. To finish, you save the encrypted variables with save . The name and size of your variables are visible, but this is probably not critical (you can also encrypt the names if necessary).

Follow the same steps to download.

+2
source share

Use getByteStreamFromArray and getArrayFromByteStream for serialization / deserialization. You can change the resulting bytes before writing them to a file.

 % A cell array of several data types >> byteStream = getByteStreamFromArray({pi, 'abc', struct('a',5)}); % 1x312 uint8 array >> getArrayFromByteStream(byteStream) ans = [3.14159265358979] 'abc' [1x1 struct] 

As explained at http://undocumentedmatlab.com/blog/serializing-deserializing-matlab-data

+2
source share

Perhaps you could do something like the following:

 %# serialize objects into a byte array using Java bout = java.io.ByteArrayOutputStream(); out = java.io.ObjectOutputStream(bout); out.writeObject( rand(3) ) %# MATLAB matrix out.writeObject( num2cell(rand(3)) ) %# MATLAB cell array out.flush() out.close() bout.close() b = bout.toByteArray(); %# vector of type int8 %# perform processing on `b` ... %# write byte[] stream to file save file.mat b 

Then, in the opposite direction, you simply load the saved MAT file, reverse any processing you performed, and deserialize the byte stream to return the original objects.

 %# load MAT-file load file.mat b b = typecast(b,'int8'); %# cast as int8 just to be sure %# undo any processing on `b`... %# deserialize in = java.io.ObjectInputStream( java.io.ByteArrayInputStream(b) ); X1 = double( in.readObject() ) %# recover matrix X2 = cell( in.readObject() ) %# recover cell array in.close() 

Note that you will need to maintain the meta-information of the variables yourself, for example, their number and type (maybe you can save it inside the same MAT file) and use your own wrapper functions to take care of all marshaling, but you get the idea. ..


I also stumbled upon a few ideas about FEX that help in serializing / deserializing MATLAB types:

+1
source share

I am also interested in this problem. I found something but nothing works:

  • matlab save stdio you will find this hidden function but it does not work
  • engGetArray / engPutArray . This procedure allows you to copy a variable from the workspace. "

Look at the specification of the MAT files, maybe we can reproduce the serialization of Matlab with the Mex file:

Update:

I found something very interesting: run this command in the Matlab console

 edit([matlabroot '/extern/examples/eng_mat/matcreat.c']); 

or

 edit([matlabroot '/extern/examples/eng_mat/matcreat.cpp']); 

This is the documentation on how to compile it: http://www.mathworks.com/help/techdoc/matlab_external/f14500.html

In my opinion, it should be possible to use STDOUT in the pmat = matOpen(file, "w"); command pmat = matOpen(file, "w"); .

+1
source share

Step 1: mkfifo /tmp/fifo - This creates a FIFO, the name of the file representing the channel. Everything that is written into the pipe remains there until the process returns it from the pipe. Data never gets to disk.

Step 2: Run this in one terminal: openssl enc -aes-256-cbc -a -e -in fifo -out safe - runs the OpenSSL program for encryption using AES, 256-bit key, CBC mode (openssl supports many more types and encryption options, selects the one that works for you, this is a safe default value); -a Base64 encodes the output (which is good for testing, but you can probably disable it when you really use it, Base64 causes a 4/3 increase in size); -e works in encryption mode, -in fifo indicates that the input file has the name fifo (it may use the full path); -out safe indicates that the output file is named safe (again, maybe use the full path). OpenSSL will sleep until the data arrives in the channel.

OpenSSL will offer you a passphrase when some data arrives on the channel.

Check this out: run "echo foo> / tmp / fifo" in another terminal. See Password hint in the first terminal, specify the password and confirm the password, then view the contents of the file "safely":

 $ openssl enc -aes-256-cbc -a -e -in fifo -out safe # (in another terminal, "echo foo > fifo") enter aes-256-cbc encryption password: Verifying - enter aes-256-cbc encryption password: $ cat safe U2FsdGVkX18aWBw0Uz8N3SfrRg4PigL609F+HQPuc6o= 

Check out the other direction:

 $ openssl enc -aes-256-cbc -a -d -in safe enter aes-256-cbc decryption password: foo 

Now run the OpenSSL command from step 2: openssl enc -aes-256-cbc -a -e -in fifo -out safe , start your Matlab and give /tmp/fifo the SAVE() command.

It is likely that Matlab will do something stupid, like deleting any existing file with the given file name, in which case you will find your unencrypted data in a regular file with the name /tmp/fifo . Therefore, check some non-essential data first. But I hope that Matlab is written using Unix tools and simply writes to the named channel that you give it.

-one
source share

All Articles