My goal is to correct the source codes in different languages ββ(mainly C, C ++, Obj-C and Haskell) and talk about any statistics about them. (e.g. number of variables, functions, memory allocation, complexity, etc.)
LLVM seemed like the perfect tool for this because I can generate a bitcode for these languages ββand with LLVM custom passages I can do almost anything. It works fine for the C family, take the C program ( test.c ), for example:
#include <stdio.h> int main( ) { int num1, num2, sum; printf("Enter two integers: "); scanf("%d %d", &num1, &num2); sum = num1 + num2; printf("Sum: %d",sum); return 0; }
Then I run:
clang -emit-llvm test.c -c -o test.bc opt -load [MY AWESOME PASS] [ARGS]
Voila, I have almost everything I need:
1 instcount - Number of Add insts 4 instcount - Number of Alloca insts 3 instcount - Number of Call insts 3 instcount - Number of Load insts 1 instcount - Number of Ret insts 2 instcount - Number of Store insts 1 instcount - Number of basic blocks 14 instcount - Number of instructions (of all types) 12 instcount - Number of memory instructions 1 instcount - Number of non-external functions
I would like to achieve the same with Haskell programs. Take test.hs :
module Test where quicksort [] = [] quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater) where lesser = filter (< p) xs greater = filter (>= p) xs
However when i do
ghc -fllvm -keep-llvm-files -fforce-recomp test.hs opt -load [MY AWESOME PASS] [ARGS]
I get the following results, which seem completely useless for my purposes (mentioned at the beginning of this post), because they clearly do not correspond to these small lines of code. I assume this has something to do with GHC because the .ll file just created is 52Kb, and the .ll file for C is only 2Kb.
31 instcount - Number of Add insts 92 instcount - Number of Alloca insts 2 instcount - Number of And insts 30 instcount - Number of BitCast insts 24 instcount - Number of Br insts 22 instcount - Number of Call insts 109 instcount - Number of GetElementPtr insts 17 instcount - Number of ICmp insts 54 instcount - Number of IntToPtr insts 326 instcount - Number of Load insts 65 instcount - Number of PtrToInt insts 22 instcount - Number of Ret insts 206 instcount - Number of Store insts 8 instcount - Number of Sub insts 46 instcount - Number of basic blocks 1008 instcount - Number of instructions (of all types) 755 instcount - Number of memory instructions 10 instcount - Number of non-external functions
My question is, how do I go about comparing Haskell code with others without having these huge numbers? Is it possible? Should I continue to use GHC to generate LLVM IR? What other tools should be used?