Does Ruby Enumerable # zip do arrays inside?

Question

Does Ruby Enumerable # zip do arrays inside?

In Ruby - Compare two counters elegantly , it was said

The problem with zip is that it creates arrays inside, regardless of what you list. There's another problem with the input length of PARAMS

I looked at the implementation of Enumerable # zip in YARV and saw

static VALUE enum_zip(int argc, VALUE *argv, VALUE obj) { int i; ID conv; NODE *memo; VALUE result = Qnil; VALUE args = rb_ary_new4(argc, argv); int allary = TRUE; argv = RARRAY_PTR(args); for (i=0; i<argc; i++) { VALUE ary = rb_check_array_type(argv[i]); if (NIL_P(ary)) { allary = FALSE; break; } argv[i] = ary; } if (!allary) { CONST_ID(conv, "to_enum"); for (i=0; i<argc; i++) { argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each)); } } if (!rb_block_given_p()) { result = rb_ary_new(); } /* use NODE_DOT2 as memo(v, v, -) */ memo = rb_node_newnode(NODE_DOT2, result, args, 0); rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo); return result; }

Did I understand the following bits correctly?

Check if all arguments are arrays, and if so, replace some indirect array reference with a direct link

  for (i=0; i<argc; i++) { VALUE ary = rb_check_array_type(argv[i]); if (NIL_P(ary)) { allary = FALSE; break; } argv[i] = ary; }

If they are not all arrays, create an enumerator instead

  if (!allary) { CONST_ID(conv, "to_enum"); for (i=0; i<argc; i++) { argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each)); } }

Create an array of arrays only if the block is not specified

  if (!rb_block_given_p()) { result = rb_ary_new(); }

If all this is an array, use zip_ary , otherwise use zip_i and call the block for each set of values

  /* use NODE_DOT2 as memo(v, v, -) */ memo = rb_node_newnode(NODE_DOT2, result, args, 0); rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);

Returns an array of arrays; if no blocks are specified, else returns nil ( Qnil )?

  return result; }

+7

c ruby array-merge yarv

Andrew Grimm Jun 27 '11 at 0:54

source share

1 answer

mu is too short · Accepted Answer · 2011-06-27T12:46:02+0000

I will use 1.9.2-p0 as what I have.

The rb_check_array_type function is as follows:

 VALUE rb_check_array_type(VALUE ary) { return rb_check_convert_type(ary, T_ARRAY, "Array", "to_ary"); }

And rb_check_convert_type looks like this:

 VALUE rb_check_convert_type(VALUE val, int type, const char *tname, const char *method) { VALUE v; /* always convert T_DATA */ if (TYPE(val) == type && type != T_DATA) return val; v = convert_type(val, tname, method, FALSE); if (NIL_P(v)) return Qnil; if (TYPE(v) != type) { const char *cname = rb_obj_classname(val); rb_raise(rb_eTypeError, "can't convert %s to %s (%s#%s gives %s)", cname, tname, cname, method, rb_obj_classname(v)); } return v; }

Note the call to convert_type . This is very similar to the C version of Array.try_convert and try_convert just looks like this:

 /* * call-seq: * Array.try_convert(obj) -> array or nil * * Try to convert <i>obj</i> into an array, using +to_ary+ method. * Returns converted array or +nil+ if <i>obj</i> cannot be converted * for any reason. This method can be used to check if an argument is an * array. * * Array.try_convert([1]) #=> [1] * Array.try_convert("1") #=> nil * * if tmp = Array.try_convert(arg) * # the argument is an array * elsif tmp = String.try_convert(arg) * # the argument is a string * end * */ static VALUE rb_ary_s_try_convert(VALUE dummy, VALUE ary) { return rb_check_array_type(ary); }

So yes, the first loop looks for something in argv that is not an array and sets the allary flag if it finds such a thing.

In enum.c we see the following:

 id_each = rb_intern("each");

So id_each is an internal reference for the Ruby each iterator method. And in vm_eval.c we have the following:

 /*! * Calls a method * \param recv receiver of the method * \param mid an ID that represents the name of the method * \param n the number of arguments * \param ... arbitrary number of method arguments * * \pre each of arguments after \an must be a VALUE. */ VALUE rb_funcall(VALUE recv, ID mid, int n, ...)

So this is:

 argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));

Calls to_enum (essentially the default argument ) on everything in argv[i] .

So, the end result of the first for and if blocks is that argv either filled with arrays or filled with counters, and not a combination of the two. But pay attention to how logic works: if something is found that is not an array, then everything becomes an enumerator. The first part of the enum_zip function will wrap arrays in counters (which are essentially free or at least cheap enough to not worry), but will not expand counters into arrays (which can be quite expensive). Earlier versions may have gone the other way (prefer arrays over counters), I will leave this as an exercise for the reader or historians.

Next part:

 if (!rb_block_given_p()) { result = rb_ary_new(); }

Creates a new empty array and leaves it in result if zip is called without a block. And here we should note that zip returns :

 enum.zip(arg, ...) → an_array_of_array enum.zip(arg, ...) {|arr| block } → nil

If there is a block, then there is nothing to return, and result may remain as Qnil ; if there is no block, we need an array in result so that we can return the array.

From parse.c we see that NODE_DOT2 is a two-dot range, but it looks like they just use the new node as a simple three-element structure; rb_new_node simply selects the object, sets some bits, and assigns three values in the structure:

 NODE* rb_node_newnode(enum node_type type, VALUE a0, VALUE a1, VALUE a2) { NODE *n = (NODE*)rb_newobj(); n->flags |= T_NODE; nd_set_type(n, type); n->u1.value = a0; n->u2.value = a1; n->u3.value = a2; return n; }

nd_set_type is just a small macro. Now we have memo as the only three-element structure. This use of NODE_DOT2 seems like a convenient kludge.

The rb_block_call function rb_block_call represented by an internal kernel iterator. And again we see our friend id_each , so we will iterate each . Then we see a choice between zip_i and zip_ary ; this is where internal arrays are created and clicked on result . The only difference between zip_i and zip_ary looks like handling StopIteration exceptions in zip_i .

At this point, we did zipping, and we either have an array of arrays in result (if there was no block), or we have Qnil in result (if there was a block).

Summary> . The first loop explicitly avoids expanding enumerations into arrays. Calls zip_i and zip_ary will only work with temporary arrays if they are to build an array of arrays as the return value. So, if you call zip with at least one enumerator without an array and use the block form, then this enumeration is completely down and "the problem with zip is that it creates arrays from the inside" does not happen. Considering 1.8 or other Ruby implementations is left as an exercise for the reader.

Does Ruby Enumerable # zip do arrays inside?

More articles: