Implementing C enum and union in python

I am trying to figure out some kind of C code so that I can port it to python. The code is designed to read your own binary data file format. Until now, this has been straightforward - mostly these were structures, and I used the struct library to request specific types of ctypes from a file. However, I just came up with this bit of code, and I donโ€™t understand how to implement it in python. In particular, I'm not sure how to work with enum or union .

 #define BYTE char #define UBYTE unsigned char #define WORD short #define UWORD unsigned short typedef enum { TEEG_EVENT_TAB1=1, TEEG_EVENT_TAB2=2 } TEEG_TYPE; typedef struct { TEEG_TYPE Teeg; long Size; union { void *Ptr; // Memory pointer long Offset }; } TEEG; 

Secondly, in the structure definition below, I'm not sure what the colons mean after variable names (e.g. KeyPad:4 ). Does this mean that I have to read 4 bytes?

 typedef struct { UWORD StimType; UBYTE KeyBoard; UBYTE KeyPad:4; UBYTE Accept:4; long Offset; } EVENT1; 

In case this is useful, an abstract example of how I accessed the file in python is as follows:

  from struct import unpack, calcsize

 def get (ctype, size = 1):
     "" "Reads and unpacks binary data into the desired ctype." ""
     if size == 1:
         size = ''
     else:
         size = str (size)

     chunk = file.read (calcsize (size + ctype))
     return unpack (size + ctype, chunk) [0]

 file = open ("file.bin", "rb")
 file.seek (1234)

 var1 = get ('i')
 var2 = get ('4l')
 var3 = get ('10s')

+4
source share
4 answers

Enumerations: there are no enumerations in the language. Various idioms have been proposed, but none of them are common. The simplest (and in this case sufficient) solution is

 TEEG_EVENT_TAB1 = 1 TEEG_EVENT_TAB2 = 2 

Unions: ctypes has unions .

The syntax for fieldname : n is called a bit field and, yes, it means "it's n bits big." Again, ctypes has.

+8
source

I donโ€™t know the answer to your whole question, but to list that you donโ€™t need to search by value (there is, just using it to avoid magic numbers), I like to use a small class. Regular dict is another option that works great. If you need to search by value, you may need another structure.

 class TeegType(object): TEEG_EVENT_TAB1 = 1 TEEG_EVENT_TAB2 = 2 print TeegType.TEEG_EVENT_TAB1 
+2
source

What you really need to know:

  • What is the size of the listing? . You will use this answer to generate your unpacking code.
  • What is the size of the union? . Short description: the size of the largest member.
  • How do you deal with this pointer? You should take a look at the ctypes module. For what you do, it may be easier to work with it than with the struct module. In particular, it can work with pointers coming in through C.
  • How do you force / produce data read from a structure into the correct type for working with python? This is why I recommended ctypes in the pool above; This module has functions to perform the necessary translations.
0
source

A C enum declaration is a syntax wrapper around some integer type. See Is sizeof (enum) == sizeof (int), always? . How big an int will depend on the particular C compiler. I would probably start with 16 bits.

union reserves a block of memory the size of the largest of the contained data types. Again, the exact size will depend on the implementation of C, but I would expect 32 bits for a 32-bit architecture or 64-bit if it was compiled as native 64-bit code. Generally speaking, you can store the contents of a union in integer or long Python, regardless of whether the pointer or offset was stored in it.

A more interesting question is why the pointer is ever written to the disk file. You may find that the union field is only treated as a pointer when the TEEG struct is in memory, but when writing to disk, it is always an integer offset.

As for the notation: 4, as several people have noted, these are โ€œbit fieldsโ€, which means a sequence of bits, some of which can be packed into one space. If I remember correctly, the bit fields in C are packed into int s, so both of these 4-bit fields will be packed into one. They can be unpacked with the appropriate use of Python & (bitwise and) and "โ†’" (right shift). Again, exactly how the fields were packed into an integer and the size of the integer field itself will depend on the specific implementation of C.

Perhaps the following code snippet will help you:

 SIZEOF_TEEG_TYPE = 2 # First guess for enum is two bytes FMT_TEEG_TYPE = "h" # Could be "b", "B", "h", "H", "l", "L", "q" or "Q" SIZEOF_LONG = 4 # Use 8 in 64-bit Unix architectures FMT_LONG = "l" # Use "q" in 64-bit Unix architectures # Life gets more interesting if you are reading 64-bit # using 32-bit Python SIZEOF_PTR_LONG_UNION = 4 # Use 8 in any 64-bit architecture FMT_PTR_LONG_UNION = "l" # Use "q" in any 64-bit architecture # Life gets more interesting if you are reading 64-bit # using 32-bit Python SIZEOF_TEEG_STRUCT = SIZEOF_TEEG_TYPE + SIZEOF_LONG + SIZEOF_PTR_LONG_UNION FMT_TEEG_STRUCT = FMT_TEEG_TYPE + FMT_LONG + FMT_PTR_LONG_UNION # Constants for TEEG_EVENTs TEEG_EVENT_TAB1 = 1 TEEG_EVENT_TAB2 = 2 . . . # Read a TEEG structure teeg_raw = file_handle.read( SIZEOF_TEEG_STRUCT ) teeg_type, teeg_size, teeg_offset = struct.unpack( FMT_TEEG_STRUCT, teeg_raw ) . . . # Use TEEG_TYPE information if teeg_type == TEEG_EVENT_TAB1: Do something useful elif teeg_type == TEEG_EVENT_TAB2: Do something else useful else: raise ValueError( "Encountered illegal TEEG_EVENT type %d" % teeg_type ) 
0
source

All Articles