Thursday, August 31, 2006

Performance: Block Variables Breakdown

In response to my previous blog, Chris Nokleberg noted that if we're using String#equals a lot, interning will have an additional benefit...namely because String#equals short-circuits if both String objects are ==. I had forgotten about that benefit, so I thought I'd poke around for places it might have an effect.

In digging a little, I was reminded that DynamicVariableSet, which holds block-local variables, does a linear search for retrievals and mutations. Compare that to method-local variables in Scope, for which all clients must pass an index to get or set a value. I'm not sure if there's a reason why DynVars must be retrieved by name and Scope vars can be directly indexed...it's probably historical from the C code or a limitation of the current parser. In any case, a linear search could theoretically represent a performance issue accessing or modifying block-local variables. So I thought I'd run some numbers to see how frequently we have to search past the 0 value in the DynVarSet. Here's the breakdown while gem installing rake:

Count of block var retrieves from indicies 0 through 6:
516436
226049
42687
1170
656
122
1

So a total of 787121 block variable lookups, with percentages below:

index 0: 65.6%
index 1: 28.7%
index 2: 5.4%
...with the remainder filling out the last 0.3% of accesses

What can we learn from this? Well DynVarSet currently allocates initial space for up to 8 variables, based on someone's long-past examination of code that showed a maximum of 6 was a good estimate. The numbers above show that, for RubyGems + RDoc at least, a maximum of 3 would cover 99.7% of all cases without requiring expansion of the array, and that with a subsequent expansion to 6 slots we would cover all but 0.0000127% of cases. So using 6 as a maximum and always allocating 8 slots is perhaps a bit generous. More profiling on more code is warranted here, but it might be safe to drop the initial size to 3 and leave in the doubling when expanding.

It also shows us that we pay the cost of two String#equals calls for 28.7% of lookups, and pay for three in 5.4% of lookups, with higher counts statistically insignificant. So although it might seem like we're unlikely to suffer much of a performance hit because most block var lookups are at index 0, our cost to lookup or modify block variables is doubled in over a quarter of cases and tripled in a twentieth of cases. If we normalize all lookups to the same cost (X * 99.7 versus X * 139.2), that translates to a 28% performance boost for block variable lookup cost. Note that I'm not measuring block variable modifications here either, which would conceivably see a similar boost.

And of course what it doesn't show (but which we know to be true) is that if we were able to eliminate both the linear search and the call to String#equals, we'd save a very large percentage of block var lookup time, since we'd reduce it to a simple array indexing. I think it's worth exploring how we could do that--especially since we already do it for Scope in all the same places.

Isn't optimizing legacy code fun?

2 comments:

  1. Actually, classically dynamic variables are always handled like this, since you cannot in the parser know with certainty when (or if, or where) the dynvar will be created. This makes it impossible to index them rationally, so that all code gets the same index.

    Of course, I'm not sure these dynvars are the same as those found in other languages with a distinction between lexical and dynamic variables, but this is the situation in most languages.

    ReplyDelete
  2. You're right, but I'm pretty sure that's not the case here. We can do a static analysis of all block code to determine a base set of variables that will be needed. Additional variables encountered in "eval" calls within a block will have to be added to the list of block variables at runtime, but this mimics what we do already for "eval" calls within a method scope. Ultimately, in Ruby there's very little difference between block variables and local variables except for the fact that variables accessed in a block may have to reference a variable in a higher lexical scoping. Making block variables indexed like method-scoped variables wouldn't limit that behavior, since method-scoped variables end up becoming different node types anyway.

    Even for the case of blocks within blocks, block A on the outside would still have a list of variable names for block B's parse to compare with. Tom claimed to have some almost-working code for doing that analysis at parse time as well, eliminating cases where we'd need to do a search of parent blocks as well by storing that scoping at parse time. This also makes sense, because the scoping of a variable never changes...so if we can determine it at parse time, we don't need to worry about it moving.

    ReplyDelete