Representation of a BER TLV data structure in ANSI C?

Yesterday I learned about the presentation of information using the TLV format .

If you want to write a portable BER TLV encoder / decoder in ANSI C, what data structure would you use (*)?

Will something like this do?

struct TlvElement { int nTag; int nLength; unsigned char *pValue; // Byte array TlvElement *pNext; }; 

(*) Unfortunately, I cannot use C ++ and STL for this.

+4
source share
2 answers

From the wiki article:

Type and length are fixed in size (usually 1-4 bytes)

So, I would change nTag and nLength to some type of fixed length. int size is platform dependent and this may cause some problems. Correct their sizes for your protocol and use int8_t , int16_t or int32_t etc. For nLength you can even use unsigned.


Since the value can be any type, I would use void* for pValue instead of unsigned char* .


How will you use this data structure? How do you want to access different TLVs?
My point is - do you need a linked list? Or, a linked list would be the best option for your business / application / goals / etc?

I'm trying to say that you can remove the pNext element and just treat the TLV as elements of a (dynamically growing) array. It really depends on your needs.

Most likely, when implementing TLVs, you will need to send them through some kind of connection, right? If so, you need to think about some kind of protocol. I would do something like this - send the total number of TLVs at the very beginning, and I would NOT use a linked list, but a dynamic array.
You must be careful when sending such a data structure over the network - pNext pointers pNext not be valid, they must be reset on the other side of the connection.
You also need to send data carefully, but I think you know that. I just wanted to mention them.


EDIT I see that you have some problems that understand what a nested TLV means.

A nested TLV is just a TLV element that has a TLV type value. And this has nothing to do with the TLV “container” - a dynamic array or linked list.

Here's an untested example to get this idea. I would do it like this:

 struct TLV { uint32_t nTag; uint32_t nLength; void* pValue; }; // created dynamic array with 3 TLVs: TLV* pMyTLVs = malloc( 3 * sizeof( struct TLV ) ); // set the first 2 TLVs, some primitives, for example // .. // now, set the third TLV to be nested: pMyTLVs[ 2 ].nTag = ...; // set some tag, that will indicate nested TLV pMyTLVs[ 2 ].nLength = ...; // set length of the TLV element pMyTLVs[ 2 ].pValue = malloc( sizeof( struct TLV ) ); // get pointer to the new, _nested_ TLV: TLV* pNested = (TLV*)pMyTLVs[ 2 ].pValue; // now use the pNested TLV as an usual TLV: pNested->nTag = ...; pNested->nLength = ...; pNested->pValue = ...; // of course, pNested is not absolutely necessary, you may use directly // pMyTLVs[ 2 ].pValue->...; // but using pNested, makes the code more clear 

NOTE: once again, this is not verified code, but I think you get the point. Hope this helps.

+3
source

If I were to write a TLV encoder / decoder in ANSI C, I would choose a proven, standardized, flexible, serialization format (i.e. wired): ASN.1 BER , Thrift , etc. .

This is a classic area where wheels are reinvented on a daily basis. Wise people have already thought of solutions that are effective, manageable, and flexible; it makes no sense to repeat the same process.

For example, if the structure in your example was used for serialization , you still need to consider:

  • Problems with Endianess
  • Size of language types ( int size depends on compiler platform and OS)
  • Data type in the payload (you can transfer raw data, strings, numbers, bit fields, enumerations, etc.)
  • Centralized tag number distribution
  • Additional items and options
  • Composite structures (e.g. TLVs)

Some existing formats provide separation of semantics and syntax; others allow you to automatically generate an encoder / decoder for a data circuit.

Once you have chosen the serialization format, you can start looking at the format in memory , which depends heavily on how your application will manipulate the data, for example:

  • How does the application retrieve data after decoding (for example, if an integer value is specified, does the application access the encoded representation or its own representation, which can be easily used?)
  • How the application prepares data before encoding
  • Is the application multithreaded
  • If you want to minimize the overhead of copying (for example, if you have a large amount of raw data, do you need to duplicate it to encode it? If the original data is fragmented, do you need to re-encode it somewhere in continuous memory to encode it?)
  • Can decoding and decoding progressively
  • How does distributed memory belong to: an application or a library?
  • How do you handle errors like out of memory and unknown tags

I suggest taking a look at the API created by asn1c to work with ASN.1 BER or the libtasn1 API API.

+1
source

All Articles