Categories
TeaScript

Why array support is postponed

In the release article of the last TeaScript release I mentioned that the next release will add support for arrays as container. I postponed the array support and the next release (right around the corner) will not contain it. In this article I explain why.

First a short recap.
  • The last release added (Named) Tuples which can be also used as a list/array replacement (without the restriction of one element type per instance).
    (Read more about Tuples in the last release article)
  • The subscript operator [] was not implemented in the last release and should be added in the next release.
  • Arrays were planned to be added in 2 flavors:
    array of TeaScript’s ValueObject (as std::vector<teascript::ValueObject> )
    and as array of primitive types (as e.g., std::vector<double> )

The ValueObject version is most flexible and has the great advantage that all features of the ValueObject class and TeaScript in general are available for every element of the array. Also, it is very easy to implement since the data type is used already in the TeaScript code for passing parameters to a called function and some more similar things.

The primitive types version instead will enable to get the maximum performance and most optimal memory layout for the C++ level / Library. When the data is not only be processed in TeaScript but with fine grained and highly optimized custom C++ functions, then it would be the best if the C++ code is able to operate directly on the primitive data within a contiguous memory block. Some algorithms might be even impossible if the data is not placed in one memory block (e.g., think of image data as a RGB(A) or YUV(A) buffer).
But this will come with one major drawback: you cannot โ€œshare assignโ€ an element of a vector with primitive types since this can only work with teascript::ValueObject. That means, that it is impossible to have another ValueObject instance pointing to the same value inside an array. Loops, for example, would have only read access (via a copy) or the loops are limited to use an index for access the array.

The following example code illustrates it. This code assumes a fictive future version of TeaScript which has (distinct) array support and also a forall loop which is able to read and modify elements.

// imagine an array with primitive type (i64 / long long) will be created like this in a fictive future version ...
def arr := [1,2,3]  // array with 3 elements as long long

// imagine a fictive forall loop, here e is a copy(!) of the current element.
forall( e : arr ) {
    println( e )
    e := e + 1    // NOTE: will not change the value of the element inside the array! 
                  //       (probably e will be even const to prohibit this line!)
}

// if the elements in the array would be of the C++ type ValueObject, then this fictive future variant of a forall loop
// could be imagined which is able to change the value of the element inside the array.
// The @ means that the element will be shared with e. (might look different when implemented!)
forall( e @ arr ) {
    e := e  + 1       // now the array element is changed as well.
}

// But since the elements are not ValueObjects (but just long longs), there isn't any existing instance to share data with. 
// Then the e can only be a newly created ValueObject which will be destructed again in the next loop cycle.
// Hence, arrays with primitive types can only offer read access in this fictive forall loop.
// (Yes, there might be some more complex types for e, which will lead to the element will be changed as well, 
// but still the e must always be something special with a lot of problems waiting to occur
// (e.g. the element needs to be tracked for 'deletion' even though it is only a long long))
What is the state now?

The good news is that the next upcoming release (which waits just around the corner) will add subscript operator [] support for (Named) Tuples which can be used for index based access as well as for key based access.

With that the following is now possible (Here I limit the example to index based access like it is with arrays. Watch out for the next release article for a deeper look into it.):

// create some tuple
def tup := (1,2,3)

// first the OLD way

// access it by 'parse time' constant index
tup.0        // 1
tup.1        // 2
tup.2        // 3
tup.2 := 4   // is 4 now.

// or if you need to use runtime values (which is most often the case)
_tuple_val( tup, 0 )             // 1
def idx := 1
_tuple_val( tup, idx )           // 2
_tuple_set( tup, idx + 1, 3 )    // is 3 again


// But now you can use the natural [] as well :-)
tup[0]         // 1
tup[1]         // 2
tup[2]         // 3
tup[idx-1]     // 1
tup[idx] := 6  // second element is 6 now.

// This also works with nested tuples, e.g., col[1][3]

Array of ValueObjects ???

With the new subscript operator [] the Tuples can be used exactly as arrays from within TeaScript itself.
Are there any differences if there would be a std::vector<teascript::ValueObject> instead of a teascript::Collection<teascript::ValueObject> (= Tuple)?

It is hard to find any. Both can be accessed and used in the C++ layer, both offer the same functionality for the elements (since both use ValueObjects).

There could be the artificially created difference, that all ValueObjects must be of the same inner type when used in arrays. But does this have any advantages? It only leads to a more complex implementation without creating a benefit. At least, actually I don’t see any.

Is there more?
Yes, the memory layout of a pure std::vector is slightly better, because the Collection class (Source Code) does not only store the ValueObject inside a std::vector but a Key-Value-Pair (where in this case the Key is always a default value and not used). But this should not be relevant in 99,8% of the cases because the most overhead will come through the fact that a ValueObject is used instead of the underlying primitive type (or class instance).
The contiguous memory block also cannot play a big role, since there are indirections from within the ValueObject.

This all leads to the conclusion that a std::vector<ValueObject> will not bring any relevant benefits (neither on the TeaScript level nor at the C++ level) but instead will introduce confusion when people starting to question which type they should use or if they start mixing Tuples and Arrays all over the code.

Array of Primitive Types ???

So, what about std::vector< PrimitiveType > ?

First, this can be only relevant for the C++ level. There it could be a very big advantage if you can direct operate on the primitive types within a contiguous memory block instead of using instances of ValueObject.

From the point of view of TeaScript code such an array would lead only to problems because many language features are not possible for the elements of the array. It will always feel like the implementation was not finished and important things are missing (like the missing โ€œshare assignโ€ and its consequences as mentioned above).

For the time being, TeaScript could only offer 2 primitive types (not counting bool), namely long long and double.
The question is then, is there an existing (high) need to use one of the listed primitive types in a std::vector and using TeaScript the same time?

I think, this is (at least actually) not the case. Please, let me know if for your case it is different and why. I am highly interested to know and learn more about your use case.

The only thing where I still believe it is needed, is for an unsigned char as a representation of a buffer. But this is a bigger topic which will be addressed anyway at some time.

Conclusion and final thoughts

This all brought me to the decision to postpone the distinct array support in TeaScript.

From within TeaScript the Tuples are very great to use and will act as arrays (among others) perfectly (at least starting with the upcoming release). The advantages of a std::vector<ValueObject> on C++ level are too small for accept all the disadvantages inside TeaScript. The same is even more true for std::vector<PrimitiveType>.

What are you thinking about it? Do you agree or do you have different thoughts? Feel free to let me know and write a comment below.

Please, also watch out for the upcoming TeaScript 0.11.0 Release, which will be made very soon.

PS: If you are new to TeaScript, you might be interested to read the overview and highlights.

Leave a Reply

Your email address will not be published. Required fields are marked *