Data structure for storing variables in an interpreted language

I am developing my own experimental scripting language with a view to implementing it in my larger application.

Almost everything I wanted to do was programmed smoothly, but the “simple” act of storing variables in memory turned out to be the most difficult here. I do not know how to store them in order to allow all type checks, global variables and special flags. First look at the sample code:

a = 1 b = 2 someFunction() print(a) --> This should read the global variable and print `1` a = 3 --> Now `a` should become a local variable of this function and the global `a` remain unchanged x = 4 --> `x` should always be local of this function end 

I call the "locality" of variables their level , so the variables in the nested blocks have a higher level. In the above code, a and b are level 1 variables. The local variables someFunction will have level 2. The first line of the function should read the global variable a (level 1), but the second line should create a variable again called a , but with level 2, which obscures global a from now on. The third line should create a variable x with level 2. How to store and track all this in memory?

What I have tried so far:

Method 1. Saving variable=>value maps in an array of levels:

 variables { level=1 //global variables { a => 1, b => 2 }, level=2 //function variables { a => 3, x => 4 } } 

But this will make the variable search very slow, since you need to search all levels for a given variable.

Method 2: save pairs (variables, levels) as map keys:

 variables { (a, 1) => 1, //global (b, 1) => 2, //global (a, 2) => 3, //function (x, 2) => 3 //function } 

This has the same problem as before, because we must try a pair (variable, level) with all possible levels for this variable.

Which method should be used for optimal memory usage and quick access?

Additional notes:

I know how variables in the stack and heaps are controlled in other "real" languages, but it's hard for me to do this in an interpreted language. “It doesn't have to be the way Lua and Python do,” I always think. Correct me if I am wrong. I am trying to save a variable in maps and C ++ internal structures.

And finally, this is how I represent the variable. Do you think this is large and might be more memory efficient? (I also tried to put “Level” as a member here, but he had the same problem as the others).

 struct Member { uchar type; //0=num, 1=str, 2=function, 3=array, etc uchar flags; //0x80 = read-only, 0x40 = write-only, etc union { long double value_num; char* value_str; int value_func; //etc }; }; 
+7
source share
2 answers

An easy thing like your array is to maintain a stack of cards. Each map contains bindings for this area. To bind a variable, add it to the top map; to search for a variable starting at the top of the stack and stop when you reach a map containing a binding for that variable. The search takes a bit, but starting from the top / end, you need to search only until you find it - in most cases this search will not be very long.

You can also make the stack implicit by encapsulating this logic in the Environment class, which has local bindings and an inherited environment used to solve unknown variables. Need to move to a new area? Create a new environment with the current environment as its base, use it, and then discard it when the area is finished. The root / global environment may have a zero legacy environment. This is what I will probably do.

+5
source

It is worth noting that if inside the function you do not have access to any variables from the caller's function, it reduces the number of levels that you need to look at. For example:

 variable a; function one() { variable b; // in this function, we can see the global a, local b two(); } function two() { // in this function, we can see the global a, local c // we cannot see the local b of our caller variable c; while (true) { variable d; // here we can see local d, local c, global a } } 

The idea is that the boundaries of functions limit the visibility of variables, and the global area is "special."

At the same time, you can consider the possibility of eliminating the features of global variables, but letting the code indicate that they want to access non-local variables

 variable a; function one() { global a; // or upvar #0 a; variable b; // in this function, we can see the global a, local b two(); } function two() { // in this function, we can see the local c // and the local b of our caller // (since we specifically say we want access to "b" one level up) upvar 1 b; variable c; } 

It looks complicated at first, but it’s very easy to understand once you get used to it (upvar is a construction from the Tcl programming language). What this allows you is access to variables in the area of ​​your caller, but this avoids costly searching by requiring you to specify exactly where this variable comes from (with 1 level one level above the call stack, 2 two levels up , and # 0 is "special", saying "topmost call stack, global"

+2
source

All Articles