How to Lexically Scope Like a Boss
What is a scope?
I’m assuming you have written code before. Hell, I’d dare assume you’ve written a var.
1 2 3 4 5 6 7 8 9 10 11 12
Let’s start with this relativly simple example to begin growing some understanding. Internally, the interpreter/compiler represents scopes with a tree (in simple cases, just a linked list). At each node in the tree is a dictionary of values that we refer to with the variable names from your source code. When we refer to foo, the interpreter (or the compiler at compile time) will look for it in the most immediate scope. If it doesn’t find a value there, it will look up the scope tree until it finds a node with a matching key/val and uses it for the current instruction.
Look at that previous code sample. If there wasn’t a var foo at the top of the function, The interpreter would look for scope[foo] and find that it isn’t there. It would then look for an entry named foo in the current scope’s parent. That is the global scope where the variable foo exists. It refers to the value “super bar.”
Creating scopes and determining the current scope’s parent
1 2 3 4 5 6 7 8
This creates a tree with three nodes we can visually represent in pseudo-coffeescript.
1 2 3 4 5 6 7 8 9 10 11 12
As you can see, we have three nodes and there are variables that are available at each node. If you try to access a variable, the interpreter will look up the nodes until it finds an entry that matches.
Lets go deeper. How about we not even bother with immediatly invoking the function. How would that work?
The simple answer is that it changes nothing. The scope chain rules are consistent. The only difference is that we now have two ways variables can enter the scope.
- Through the scope tree, variables in parent scopes are available.
- When variables are injected in via the function arguments which can be pulled from some other scope where we invoke the function.
We delay the instantiation of the scopes until they are invoked later in the program.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Scope order is created lexically. Each new function definition in the source (ie: text, hence the lexical) is being read into the interpreter as a place to mark a new node in the scope chain on invocation. I mentioned that the scope datastructure resembles a tree because both Scope C and E belong to B as siblings. B in turn, belongs to A.
None of these functions are being invoked but the scope lookup rules are locked to A -> B, B-> C, B->E Things get hairier when we invoke them.
1 2 3 4 5
How on earth is this function maintaining state? The Binder, on invocation, takes the argument 5 and creates a scope according to the rules set forward in the source code of binder.
Now, 2 is in Scope B and being assigned to the variable “state” we instantiated in scope B. We then return the object back.
The object we get back from calling binder() has two functions and thus two child scopes of B, (C, and D) which we return back to Scope A. via the return.
Now it gets weird. When we call b1.get(), we invoke a function from scope A which internally has a scope C who’s parent scope is scope B. It thus returns a value from scope B back into scope A. Think of it like a closed loop of scopes. ie: a closure. ;)
This is pretty powerful. b1.set() does something similar only its injecting a variable from scope A containing 2 into scope E where it is assigned to a variable in scope B and subsuquently returned. B1.get(), on its next invocation returns the same value stored in that variable from the same scope into scope A, the top level context.
We can use this to create objects with completely encapsulated states. This gives us a leg up when trying to wrangle asynchrounous processes.
Here’s an example of a function that runs a function passed as an argument after a delay. while we wait for the function to fire, we can register functions to be called when the function is resolved.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
Yeah its a wee bit contrived but it illustrates a point. By using a closure to wrap a bunch of data into this enclosed space, we can hide away some pretty sophisticated machinery to invert the responsibilty of control. Now the object maintains the state of the async call to the file and we register functions that it will call when the async call is resolved. This removes alot of overhead for managing async functions.
In Node.js, we pass a callback into the third paramater for the standard library functions.
What if we could apply the same principle and get an object we could pass along functions to and know it will take care of running them when the file is available?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
Hey, that wasn’t so bad. We return an object of three functions and it will take care of running the appropriate functions based on the status of the object’s internal async operation when its resolved.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
In short, lexical scoping and closures are pretty damn awesome.