Cmm for an external primitive (integer-gmp example)

I checked the integer-gmp source code to understand how foreign primitives can be implemented in terms of cmm, as described on the GHC Primops Page . I know about the methods of their implementation using llvm hack or fvia-C / gcc - this is more for study in order to understand this third approach that the interger-gmp library uses.

So, I looked at the CMM tutorial on the MSFT page (pdf link) , went through the GHC CMM page , and there are still some unanswered questions (it's hard to keep all of these concepts in my head without delving into CMM, which I'm doing now). This piece of code exists from the cmm integer-bmp file :

integer_cmm_int2Integerzh (W_ val) { W_ s, p; /* to avoid aliasing */ ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val); p = Hp - SIZEOF_StgArrWords; SET_HDR(p, stg_ARR_WORDS_info, CCCS); StgArrWords_bytes(p) = SIZEOF_W; /* mpz_set_si is inlined here, makes things simpler */ if (%lt(val,0)) { s = -1; Hp(0) = -val; } else { if (%gt(val,0)) { s = 1; Hp(0) = val; } else { s = 0; } } /* returns (# size :: Int#, data :: ByteArray# #) */ return (s,p); } 

As defined in ghc cmm header :

 W_ is alias for word. ALLOC_PRIM_N is a function for allocating memory on the heap for primitive object. Sp(n) and Hp(n) are defined as below (comments are mine): #define WDS(n) ((n)*SIZEOF_W) //WDS(n) calculates n*sizeof(Word) #define Sp(n) W_[Sp + WDS(n)]//Sp(n) points to Stackpointer + n word offset? #define Hp(n) W_[Hp + WDS(n)]//Hp(n) points to Heap pointer + n word offset? 

I don't understand line 5-9 (line 1 is the beginning if you have a 1/0 confusion). More specific:

  • Why is the format of calling the ALLOC_PRIM_N function (bytes, fun, arg) this way?
  • Why is p controlled this way?

The function, as I understand it (from searching for the function signature in Prim.hs ) accepts int and returns an array (int, byte) (stored in s , p respectively, in the code).

For those interested in the built-in call in if block , this is the implementation of cmm gmp mpz_init_si . I assume that if you call a function defined in the object file via ccall, it cannot be inlined (which makes sense, since it is object code, not intermediate code). The LLVM approach is more suitable for embedding through LLVM IR). Thus, the optimization was to determine the cmm representation of the function that should be inlined. Please correct me if this assumption is incorrect.

The explanation of lines 5-9 will be greatly appreciated. I have more questions about other macros defined in the integer-gmp file, but there may be too many questions in one post. If you can answer the question using the Haskell wiki page or blog (you can post the link as an answer), that would be highly appreciated (and if you do, I would also like to step through the integer -gmp cmm step by step, like GMP_TAKE2_RET1 )

+63
haskell ghc
Apr 17 '13 at 18:34
source share
2 answers

These lines highlight the new ByteArray # in the Haskell heap, so to understand them, you first need to know a little about how the GHC heap is handled.

  • Each feature (= the OS thread that runs the Haskell code) has its own dedicated day nursery, an area of ​​the heap into which it makes regular, small allocations like this. Objects are simply allocated sequentially to this area from low addresses to high addresses until they try to make a selection that exceeds the remaining space in the nursery, which the garbage collector launches.

  • All heap objects are aligned with a multiple of the size of the word, i.e. 4 bytes on 32-bit systems and 8 bytes on 64-bit systems.

  • The Cmm Hp level register indicates the (beginning) last word that was highlighted in kindergarten. HpLim points to the last word that can be highlighted in the nursery. (HpLim can also be set to 0 by another thread to stop the world for the GC or send an asynchronous exception.)

  • https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects contains information about the layout of individual heap objects. In particular, each heap object begins with an information pointer that (among other things) identifies what type of heap it is.

The Haskell ByteArray # type is implemented using the ARR_WORDS heap object type. An ARR_WORDS object consists only of (an information pointer followed by) a size (in bytes) followed by arbitrary data (payload). The payload is not interpreted by GC, so it cannot store pointers to Haskell heap objects, but it can store anything else. SIZEOF_StgArrWords is the size of the header common to all objects in the ARR_WORDS heap, in which case the payload is just one word, so SIZEOF_StgArrWords + WDS (1) is the amount of space that we need to allocate.

ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS (1), integer_cmm_int2Integerzh, val) expands to something like

 Hp = Hp + (SIZEOF_StgArrWords + WDS(1)); if (Hp > HpLim) { HpAlloc = SIZEOF_StgArrWords + WDS(1); goto stg_gc_prim_n(integer_cmm_int2Integerzh, val); } 

The first line increases Hp by the amount to be allocated. The second line checks for heap overflow. The third line records the amount we tried to allocate, so GC can cancel it. The fourth line calls GC.

The fourth line is the most interesting. The arguments tell GC how to restart the thread once the garbage collection is complete: it should reinitialize integer_cmm_int2Integerzh with the argument val. "_n" in stg_gc_prim_n (and "_N" in ALLOC_PRIM_N) means that val is not a pointer argument (in this case, Int #). If val was a pointer to a Haskell heap object, the GC should know that it is alive (so it is not going to) and re-output our function with the new object address. In this case, we will use the _p option. There are also options such as _pp for multiple pointer arguments, _d for Double # arguments, etc.

After line 5, we successfully allocated a block of bytes SIZEOF_StgArrWords + WDS (1) and, remember, Hp indicates its last word. So p = Hp - SIZEOF_StgArrWords sets p to the beginning of this block. Lines 8 fill information pointer p, identifying the newly created heap object as ARR_WORDS. CCCS is the current cost center stack, used only for profiling. When profiling is enabled, each heap object contains an additional field that basically determines who is responsible for its distribution. Non-profiling assemblies do not have CCCS, and SET_HDR simply sets the information pointer. Finally, line 9 fills in a ByteArray # sized field. The rest of the function fills the payload and returns the sign value and pointer of the ByteArray # object.

So this has more to do with a bunch of GHCs than with Cmm, but I hope this helps.

+5
Sep 22 '14 at 2:54
source share

enter image description here

Necessary knowledge

To perform arithmetic and logical operations, computers have a digital circuit called ALU (Arithmetic Logical Unit) in its CPU (Central Processing Unit). ALU loads data from input registers . Processor register memory in L1 cache (data requests within three CPU clock pulses) implemented in SRAM (Static Random Access Memory) located in the CPU chip . A processor often contains several types of registers, which usually differ in the number of bits that they can hold .

Numbers expressed in discrete bits may contain a finite number of values. Typically, numbers have the following primitive types represented by a programming language ( in Haskell ):

  8 bit numbers = 256 unique representable values 16 bit numbers = 65 536 unique representable values 32 bit numbers = 4 294 967 296 unique representable values 64 bit numbers = 18 446 744 073 709 551 616 unique representable values 

Fixed-precision arithmetic for these types is implemented at the hardware level . Word size refers to the number of bits that can be processed by a computer processor at a time. For x86 architecture it is 32 bit and x64 it is 64 bit .

IEEE 754 defines the number of floating point numbers that is standard for numbers {16, 32, 64, 128}. For example, a 32-bit point number (with 4,294,967,296 unique values) may contain approximate values [- 3.402823e38 to 3.402823e38] with an accuracy of at least 7 floating point digits .

enter image description here

Besides,

The acronym GMP stands for GNU Multicast Arithmetic Library and adds support for software emulated arithmetic of arbitrary precision. Compiler Glasgow Haskell . Using Integer uses this.

GMP aims to be faster than any other bignum library for all operand sizes. Here are some important factors:

  • Using complete words as the main arithmetic type.
  • Using different algorithms for different sizes of operands; algorithms that are faster for very large numbers are usually slower for small numbers.
  • Highly optimized assembly language code for the most important internal loops specialized for different processors.

Answer

For some Haskell, it may be a little difficult to understand the syntax, so here is the javascript version

 var integer_cmm_int2Integerzh = function(word) { return WORDSIZE == 32 ? goog.math.Integer.fromInt(word)) : goog.math.Integer.fromBits([word.getLowBits(), word.getHighBits()]); }; 

Where goog The Google Closure class used is located in Math.Integer . Called Functions:

 goog.math.Integer.fromInt = function(value) { if (-128 <= value && value < 128) { var cachedObj = goog.math.Integer.IntCache_[value]; if (cachedObj) { return cachedObj; } } var obj = new goog.math.Integer([value | 0], value < 0 ? -1 : 0); if (-128 <= value && value < 128) { goog.math.Integer.IntCache_[value] = obj; } return obj; }; goog.math.Integer.fromBits = function(bits) { var high = bits[bits.length - 1]; return new goog.math.Integer(bits, high & (1 << 31) ? -1 : 0); }; 

This is not entirely correct, since the return type must be return (s,p); Where

  • s is the value
  • p - sign

To fix this GMP shell, you must create. This was done in Haskell for the JavaScript compiler ( source link ).

Lines 5-9

 ALLOC_PRIM_N (SIZEOF_StgArrWords + WDS(1), integer_cmm_int2Integerzh, val); p = Hp - SIZEOF_StgArrWords; SET_HDR(p, stg_ARR_WORDS_info, CCCS); StgArrWords_bytes(p) = SIZEOF_W; 

Follow

  • makes space like a new word
  • creates a pointer to it
  • set pointer value
  • set pointer size
+3
Sep 24 '14 at 6:32
source share



All Articles