C ++ interpreter conceptual problem

I created an interpreter in C ++ for the language I created.

One of the main design problems was that I had two different types of language: number and string. Therefore, I need to pass a structure, for example:

class myInterpreterValue { myInterpreterType type; int intValue; string strValue; } 

Objects of this class are transmitted about a million times per second, for example: a countdown cycle in my language.

Profiling indicated: 85% of the performance is consumed by the line pattern highlighting function.

This is pretty clear to me: my interpreter is poorly constructed and doesn't use pointers enough. However, I have no way: I cannot use pointers in most cases, since I just need to make copies.

How to do something against this? Is such a class better?

 vector<string> strTable; vector<int> intTable; class myInterpreterValue { myInterpreterType type; int locationInTable; } 

Thus, the class only knows what type it represents and the position in the table

This again has disadvantages: I would have to add temporary values ​​to the / int vector line table and then delete them again, this will again bring a lot of performance.

  • Help, how do interpreters of languages ​​such as Python or Ruby do it? They somehow need a structure that represents a value in the language, like something that can be either int or string.
+7
c ++ design-patterns interpreter
source share
4 answers

I suspect that many values ​​are not strings. So the first thing you can do is get rid of the string object if you don't need it. Put him in a union. Another thing is that probably many of your lines are small, so you can get rid of the heap distribution if you save small lines in the object itself. For this, LLVM has a SmallString . And then you can use string interning as another answer says. To do this, LLVM has a StringPool class: call intern("foo") and get a smart pointer that references a common string that is potentially used by other myInterpreterValue objects too.

The union can be written as follows:

 class myInterpreterValue { boost::variant<int, string> value; }; 

boost::variant has a tag type for you. You can implement it this way if you have no incentive. It is not possible to get alignment in C ++, so we are pushing some types that may require a lot of alignment in the storage pool.

 class myInterpreterValue { union Storage { // for getting alignment long double ld_; long long ll_; // for getting size int i1; char s1[sizeof(string)]; // for access char c; }; enum type { IntValue, StringValue } m_type; Storage m_store; int *getIntP() { return reinterpret_cast<int*>(&m_store.c); } string *getStringP() { return reinterpret_cast<string*>(&m_store.c); } public: myInterpreterValue(string const& str) { m_type = StringValue; new (static_cast<void*>(&m_store.c)) string(str); } myInterpreterValue(int i) { m_type = IntValue; new (static_cast<void*>(&m_store.c)) int(i); } ~myInterpreterValue() { if(m_type == StringValue) { getStringP()->~string(); // call destructor } } string &asString() { return *getStringP(); } int &asInt() { return *getIntP(); } }; 

You get the idea.

+3
source share

I think some dynamic languages ​​cache all equivalent strings at run time with a hash search and only store pointers. In each iteration of the loop, where the string remains unchanged, therefore, there will only be an indication of the pointer or, at most, a hash function of the strings. I know several languages ​​(Smalltalk, I think?) Do this not only with strings, but also with small numbers. See Weight Figure .

IANAE on this. If this does not help, you should indicate the loop code and go through how it is interpreted.

+1
source share

In both Python and Ruby, integers are objects. Thus, it is not a question that the “value” is an integer or a string, it can be anything. In addition, everything in both of these languages ​​is garbage collection. There is no need to copy objects, pointers can be used internally if they are safely stored somewhere where the garbage collector collects them.

So, one solution to your problem would be the following:

 class myInterpreterValue { virtual ~myInterpreterValue() {} // example of a possible member function virtual string toString() const = 0; }; class myInterpreterStringValue : public myInterpreterValue { string value; virtual string toString() const { return value; } }; class myInterpreterIntValue : public myInterpreterValue { int value; virtual string toString() const { char buf[12]; // yeah, int might be more than 32 bits. Whatever. sprintf(buf, "%d", value); return buf; } }; 

Then use virtual calls and dynamic_cast to enable or type check instead of comparing with the values ​​of myInterpreterType.

The usual thing at the moment is the concern that virtual function calls and dynamic casts can be slow. Both Ruby and Python use virtual function calls everywhere. Although not virtual C ++ calls: for both languages, their “standard” implementation is in C with customizable mechanisms for polymorphism. But, in principle, there is no reason to suppose that “virtual” means “windowed performance”.

However, I expect that they both probably have some clever optimization for certain integer goals, including as loop counters. But if you currently see that most of your time is spent copying blank lines, then virtual function calls are compared almost instantly.

The real concern is how you plan to manage resources - depending on your plans for your interpreted language, garbage collection may be more of a problem than you want.

+1
source share

The easiest way to solve this is to make it a pointer to a string and select it only when creating a string value. You can also use a join to store in memory.

 class myInterpreterValue { myInterpreterType type; union { int asInt; string* asString; } value; } 
0
source share

All Articles