How to deserialize classes in Psych?

How to deserialize in Psych to return an existing object, such as a class object?

To do a class serialization, I can do

require "psych" class Class yaml_tag 'class' def encode_with coder coder.represent_scalar 'class', name end end yaml_string = Psych.dump(String) # => "--- !<class> String\n...\n" 

but if I try to do Psych.load on this, I will get an anonymous class, not a String class.

The usual method is to deserialize Object#init_with(coder) , but this only changes the state of the existing anonymous class, whereas I want the String class.

Psych::Visitors::ToRuby#visit_Psych_Nodes_Scalar(o) has cases where, instead of modifying existing objects with init_with they ensure that the correct object is created first (for example, calling Complex(o.value) to deserialize a complex number), but I'm not sure. I think I should defuse this method.

Am I doomed to work with a low level or level of radiation, or am I missing something?

Background

I will talk about the project, why it needs classes and why (De) serialization.

Project

The Small Eigen Collider aims to create random tasks to run Ruby. The original goal was to see if different Ruby implementations (e.g. Rubinius and JRuby) returned the same results, the same random tasks, but I found that it was also good for detecting the segfault paths of Rubinius and YARV.

Each task consists of the following:

 receiver.send(method_name, *parameters, &block) 

where receiver is a random object, and method_name is the name of a randomly selected method, and *parameters is an array of randomly selected objects. &block not very random - it is basically equivalent to {|o| o.inspect} {|o| o.inspect} .

For example, if the receiver was "a", the method name was: casecmp and the parameters were ["b"], then you will call

 "a".send(:casecmp, "b") {|x| x.inspect} 

which is equivalent (since the block doesn't matter)

 "a".casecmp("b") 

A small native collider runs this code and registers these inputs and also the return value. In this example, most implementations of Ruby return -1, but at one point, Rubinius returned +1. (I filed this as an error https://github.com/evanphx/rubinius/issues/518 and the Rubinius maintainers fixed the error)

What are classes for?

I want to be able to use class objects in my Small Eigen Collider. They will usually be the receiver, but they can also be one of the options.

For example, I found that one of the ways segfault YARV is to execute

 Thread.kill(nil) 

In this case, the receiver is an object of the Thread class, and the parameters are [Zero]. (Bug report: http://redmine.ruby-lang.org/issues/show/4367 )

Why serialization is needed (de)

Small Eigen Collider needs serialization for several reasons.

One of them is to use a random number generator to generate a number of random tasks each time not practical. JRuby has another built-in random number generator, so even with a given PRNG seed it gives YARV different tasks. Instead, I create a list of random tasks once (the first ruby ​​run of bin / small_eigen_collider), first serialize the task list to .yml tasks, and then subsequent launches of the program (using various Ruby implementations) reads the file in this tasks.yml to get task list.

Another reason I need serialization is because I want to be able to edit the task list. If I have a long list of tasks that lead to a segmentation error, I want to reduce the list to the minimum required to cause a segmentation error. For example, with the following error https://github.com/evanphx/rubinius/issues/643 ,

 ObjectSpace.undefine_finalizer(:symbol) 

in itself does not cause a segmentation error, and does not

 Symbol.all_symbols.inspect 

but if you put these two together, then. But I started with a thousand tasks, and it was necessary to return them to these two tasks.

Does deserialization make sense returning existing class objects to this context, or do you think the best way?

+4
source share
2 answers

The psychological assistant implemented serialization and deserialization of classes and modules . Now he's in Ruby!

+1
source

The status quo of my current research:

To achieve the desired behavior, you can use my workaround mentioned above.

Here is an example of nicely formatted code:

 string_yaml = Psych.dump(Marshal.dump(String)) # => "--- ! \"\\x04\\bc\\vString\"\n" string_class = Marshal.load(Psych.load(string_yaml)) # => String 

Your hack with class change may never work, because the processing of the real class is not implemented in psych / yaml.

You can take this repo tenderlove / psych , which is a standalone lib.

(Gem: psych - to download it, use: gem 'psych'; require 'psych' and test with Psych::VERSION )

As you can see in line 249-251, the processing of objects using the class of an anonymous class is not processed.

Instead of monkeypatching the Class class, I recommend that you contribute to Psych lib by extending this class.

So, in my opinion, the last yaml result should be something like this: "--- !ruby/class String"

After one night, thinking about what I can say, this feature will be very enjoyable!


Update

Found a tiny solution that seems to work its own way:

code gist: gist.github.com/1012130 (with descriptive comments)

+1
source

All Articles