Tech

Ruby Virtual Machine Internals and Investigating Variable Scope

Binary code

The idea of scope in any programming language is a pretty fundamental concept that – although people often struggle with it at first – quickly becomes second nature; the idea of getting tripped up by an issue with scope in a language you’ve been using for years is laughable! If you want a laugh then keep reading…

When working with an existing piece of code that I wanted to call in a different context I stumbled across some behaviour that I couldn’t explain. This was a Ruby module that was intended to be mixed-in and used in the context of a Chef recipe. I wanted to reuse it elsewhere but it was already called in many different places so I wanted to touch it as little as possible. To that end I passed in a parameter that would be used if a particular instance variable wasn’t available. By way of illustration it was something like:

If a value was passed into the method for new_overriding_parameter then that would be set as the value of  instance_variable but, frustratingly, the value of instance_variable was nil when new_overriding_parameter was nil – it appeared that the existing variable, which should have been in scope, was being trampled. After printing out a lot of debugging statements I narrowed it down to the presence of the local assignment in the method. Without that particular line this behaved as I’d expected it to.

So what is happening here? I had a hunch but to confirm it I need to see what instructions were being passed to the Ruby virtual machine. To do this we need to inspect instances of the InstructionSequence class in the RubyVM module 1.Ruby core documentation – http://ruby-doc.org/core-2.2.2/RubyVM/InstructionSequence.html.

These InstructionSequence objects represent the bytecode instructions sent to the Ruby virtual machine. The Ruby virtual machine was originally written by Koichi Sasada as an external project, called YARV 2.https://en.wikipedia.org/wiki/YARV. On 31st of December 2006 Yukihiro Matsumoto merged YARV into the main Ruby project 3.https://github.com/ruby/ruby/commit/a3e1b1ce7ed7e7ffac23015fc2fde56511b30681 where it has since been the default virtual machine for the language.

To create an InstructionSequence object and inspect its instruction sequence we do something like:

Running that will give us the following output:

The block of text above isn’t the prettiest thing but there should be some familiar bits we can pick out: <class:T>, attr_accessor, initialize. We can infer that these are the parts concerned with defining the class and setting up the class attribute and methods. The part we’re interested in is below the line: == disasm: <RubyVM::InstructionSequence:t@<compiled>>=================== (highlighted above).

It might not that clear exactly what’s going on but we can see instructions that look like setlocal and getlocal and we can guess that these are operating on local variables.

All of these instructions are defined in a file call insns.def in the Ruby source code 4.https://github.com/ruby/ruby/blob/v2_2_2/insns.def. The comment for setlocal does indeed say, “Set a local variable” so our assumption was valid.

To get something to compare it to, lets look at what happens when we don’t have the assignment to our a variable; just taking the lines we’re interested in:

An interesting place to look is the line starting local table: in the first example that has size: 3; this time it has size: 2. This table is where Ruby stores all references to local variables 5.Ruby Under a Microscope, Pat Shaughnessy, p. 46. It seems that just by including a local assignment we tell the VM that there is a local variable by this name, thus overriding the class attribute of the same name. That means that when we refer to a anywhere in this context it refers to the local variable.

We can see more evidence of this further down where in the first example, at instruction 0019, there is a getlocal operation – in the second example, at instructions 0013 and 0014 there is a putself (put the current instance onto the stack) followed by an opt_send_without_block, with the argument a.

So it appears that just introducing the assignment to a creates a local variable of that name – regardless of whether or not that assignment is actually executed.

The confusing part of this problem was that the body of the if expression wasn’t being evaluated so there was no way that anything new could have been assigned to the instance variable. But sure enough, the value of a ended up being nil. It was clear after this investigation that we’d actually introduced another variable of the same name.

For an example Ruby script that will create two files with the different bytecode instructions used here, see https://gist.github.com/grahamlyons/b8487540dc283b38b4f7.

(The solution was to introduce a local variable with a new name, having either the method argument or the instance variable assigned to it – in actual fact, much more sane than overwriting the instance variable anyway…)

When we want to refer to the instance variable we can actually be more explicit by prefixing it with self., which is implicitly called in the virtual machine instructions in the case of a variable which isn’t available in the local table. Interestingly, there is a difference in the bytecode instructions when calling a compared to self.a.

By putting the following code into a file called test.rb and running it with ruby --dump insns ./test.rb (another way to disassemble the code) we can inspect the bytecode:

The output looks like this:

The pertinent instructions here are 0004 and 0005 which look very similar to 0010 and 0011 – these both include the instruction putself, even though only the first one actually explicitly refers to the local object instance in the Ruby code. Interestingly, there is a difference in the flags passed to these instructions: FCALL|VCALL|ARGS_SIMPLE vs ARGS_SIMPLE. The reason for this difference is left as an exercise for the reader i.e. I don’t know…

And now for the Columbo moment – just one more thing… Instruction 0016 above accesses the instance variable @a directly, but does nothing with it other than leaving it to be returned from the method. If this instruction is moved one line up to where it won’t be returned then it doesn’t appear in the instructions to the virtual machine – the compiler optimises it away. Curiouser and curiouser…

References   [ + ]

1. Ruby core documentation – http://ruby-doc.org/core-2.2.2/RubyVM/InstructionSequence.html
2. https://en.wikipedia.org/wiki/YARV
3. https://github.com/ruby/ruby/commit/a3e1b1ce7ed7e7ffac23015fc2fde56511b30681
4. https://github.com/ruby/ruby/blob/v2_2_2/insns.def
5. Ruby Under a Microscope, Pat Shaughnessy, p. 46

Discussion

Leave a Reply

Your email address will not be published. Required fields are marked *