Good. Since running the test code does not require any external ruby libraries, I can compile 1.9 on my machine without installing it and run the test program.
Here is what I see:
- Ruby seems to “freeze” (you cannot interrupt it, and it does not exit on its own).
top shows that ruby runs at 100% CPUstrace does not show output when it enters 100% CPU mode.
This shows that Ruby goes into an infinite loop. And looking at each_byte in io.c , and adding printf to a suspicious location, we find where we get stuck:
static VALUE rb_io_each_byte(VALUE io) { rb_io_t *fptr; char *p, *e; RETURN_ENUMERATOR(io, 0, 0); GetOpenFile(io, fptr); for (;;) { p = fptr->rbuf+fptr->rbuf_off; e = p + fptr->rbuf_len; printf("UH OH: %d < %d\n", p, e); while (p < e) { fptr->rbuf_off++; fptr->rbuf_len--; rb_yield(INT2FIX(*p & 0xff)); p++; errno = 0; } rb_io_check_byte_readable(fptr); READ_CHECK(fptr); if (io_fillbuf(fptr) < 0) { break; } } return io; }
On my machine, he prints this:
UH OH: 0 < 0 UH OH: 137343104 < 137351296 UH OH: 137343119 < 137343104 UH OH: 137343119 < 137343104 UH OH: 137343119 < 137343104 ...ad infinitum...
And 137343119 is not less than 137343104, which means that we stop going into the while (which will give the block).
When you run the code so that it does not hang, you get the following:
UH OH: 0 < 0 UH OH: 137341560 < 137349752 UH OH: 137341560 < 137349752 UH OH: 137341560 < 137349752 UH OH: 137341560 < 137349752 ....
And 137341560 IS is less than 137349752.
Anyway ... that's all I got now. I don’t know why this is happening. But now we at least know what is going on. Someone who wrote this code could probably explain why this is happening.
Anyway. I still think that calling lseek somehow ruined the ruby internal file pointers, and because of this, haywire comes out of this loop.
EDIT
And here is the fix:
Change flush_before_seek in io.c to look like this:
static rb_io_t * flush_before_seek(rb_io_t *fptr) { int wbuf_len = fptr->wbuf_len; if (io_fflush(fptr) < 0) rb_sys_fail(0); if (wbuf_len != 0) io_unread(fptr); errno = 0; return fptr; }
What I added is a check on wbuf_len != 0 , so we do not do io_unread unnecessarily. Calling io_unread , and in the loop each_byte is what gets in the way. Skipping unread does the job, and all tests for the make test still pass.
In any case, this is not a correct correction, since there is some fundamental error of thought with f.pos . This is just a workaround ... but it fixes the problem above: - /