Are recursive types really the only way to create substandard data structures of arbitrary size?

I just noticed a question asking a question about which recursive data types ("self-referencing types") would be good for C ++, and I was tempted to boldly declare

This is the only way to build data structures (more precisely containers) that can accept arbitrary large collections of data without using areas of continuous memory.

That is, if you didn’t have random access arrays, you will need some means of referencing (logically) the type inside this type (obviously, instead of having a member of MyClass* next , you could say void* next , but that’s still will point to a MyClass object or derived type).

However, I am cautious with absolute statements - simply because I could not come up with something, it does not mean that it is impossible, so am I missing something ? Are there data structures that are not organized using mechanisms similar to linked lists / trees and do not use continuous sequences exclusively?


Note It is labeled as and , d be specifically interested in C ++, but also in theoretical aspects.

+7
source share
5 answers

This is the only way to build data structures (more precisely containers) that can accept arbitrary large collections of data without using areas of continuous memory.

After some reflection, this statement seems correct. In fact, this is self-evident.

Suppose I have a collection of elements in non-contiguous memory. Also suppose I'm in element e . Now the question is, how do you know the next item in the collection? Is there anyway?

Given an element e from the collection, there are only two ways to calculate the location of the following element:

  • If I assume that it is in the offset sizeof(e) , regardless of what e , this means that the next element starts where the current element ends. But then this means that the collection is in adjacent memory, which is prohibited in this discussion.

  • The e element itself tells us the location of the next element. It can store the address itself or the offset. In any case, he uses the concept of self-reference, which is also prohibited in this discussion.

As I can see, the basic idea of both of these approaches is exactly the same: they both implement self-esteem. The only difference is that in the first case, self-consistency is done implicitly, using sizeof(e) as the offset. This implicit self-promotion is supported by the language itself and is implemented by the compiler. In the latter, it is explicitly, everything is done by the programmer himself, since now the offset (or pointer) is stored in the element itself.

Therefore, I do not see a third approach to the implementation of self-reference. If not self-reference, then what terminology will be used to describe the calculation of the location of the next element in element e .

So, my conclusion: your statement is absolutely correct.

+4
source

The problem is that the dynamic allocator itself manages the continuous storage. Think of the “tape” used for the Turing machine, or von Neumann architecture. Therefore, in order to seriously consider the problem, you probably need to develop a new computational model and a new computer architecture.

If you think that ignoring the adjacent memory of the underlying machine is okay, I'm sure several solutions are possible. The first thing that comes to my mind is that each node of the container is marked with an identifier that has nothing to do with its position in memory. Then, to find the associated node, all memory is scanned until an identifier is found. This is not even particularly inefficient if enough computing elements are specified in the parallel machine.

+2
source

Here is an example of a proof.

Given that the program must be of finite size , all types defined in the program should contain only a finite number of members and refer only to a finite number of other types. The same is true for any entry point into the program and for any objects defined prior to program initialization.

In the absence of continuous arrays (which are a product of a type with a natural number of runtimes and therefore unlimited in size), all types must be obtained through the composition of types, as indicated above; type inference (pointer-to-pointer-to- A ) is still limited by the size of the program. there are no objects except adjacent arrays to create a runtime value with a type.

This is a bit controversial; if, for example, comparisons are considered primitive, then you can approximate an array with a map whose keys are natural numbers. Of course, any map implementation must use self-referenced data structures (B-trees) or continuous arrays (hash tables).

Further, if the types are non-recursive , then any type chain ( A link B link C ...) should end and can be no longer than the number of types defined in the program. Thus, the total size of the data subject to the program is limited by the product of the sizes of each type, multiplied by the number of names defined in the program (at its entry point and static data).

This is done even if the functions are recursive (which strictly violates the prohibition on recursive types, since functions are types); the amount of data immediately visible at any point in the program is still limited by the product of the sizes of each type, multiplied by the number of names visible at this point.

An exception is storing the "container" in the stack of recursive function calls ; however, such a program will not be able to randomly move its data without unwinding the stack and re-reading the data, which is a bit of a disqualification.

Finally, if it is possible to create types dynamically , the above proof does not hold; we could, for example, create a Lisp-style list structure, where each cell has a separate type: cons<4>('h', cons<3>('e', cons<2>('l', cons<1>('l', cons<0>('o', nil))))) . This is not possible in most languages ​​with a typical type, although this is possible in some dynamic languages, for example. Python

+1
source

The statement is incorrect. A simple example of a std::deque counter in C ++. The main data structure (for the language-agnostic part) is a continuous array of pointers to data arrays. Actual data is stored in ropes (disjoint blocks), which are connected by a chain through a continuous array.


This may be bordering on your requirements, depending on what it means to have no contiguous memory areas. I use the interpretation that the stored data is not contiguous, but this data structure depends on the availability of arrays for the intermediate layer.

0
source

I think the best phrase is:

 It the only way to construct data structures (more precisely containers) that can accept arbitrary large data collections without using memory areas of determinable address. 

I mean, regular arrays use addr(idx)=idx*size+inital_addr to get the memory address of an element. However, if you change this to something like addr(idx)=idx*idx*size+initial_addr , then the data structure elements will not be stored in areas of continuous memory, but there are large gaps between where the elements are stored. So this is not continuous memory.

0
source

All Articles