CLOS make-instance is really slow and causes heap exhaustion in SBCL

I am writing a multi-architecture assembler / disassembler in Common Lisp (SBCL 1.1.5 on 64-bit Debian GNU / Linux), the assembler is currently generating the correct code for the x86-64 subset. To build x86-64 assembly code, I use a hash table in which the mnemonics (lines) of the build command, such as "jc-rel8" and "stosb" , are keys that return a list of 1 or more encoding functions, for example, the following :

  (defparameter * emit-function-hash-table-x64 * (make-hash-table: test 'equalp))
 (setf (gethash "jc-rel8" * emit-function-hash-table-x64 *) (list # 'jc-rel8-x86))
 (setf (gethash "stosb" * emit-function-hash-table-x64 *) (list # 'stosb-x86))

Encoding functions are similar to these (some of them are more complex):

  (defun jc-rel8-x86 (arg1 & rest args)
   (jcc-x64 # x72 arg1))

 (defun stosb-x86 (& rest args)
   (list #xaa))

Now I am trying to enable the full x86-64 command set using NASM (NASM 11/02/06) command encoding data ( insns.dat file) converted to Common Lisp CLOS syntax. This would mean replacing the regular functions used to emit binary code (e.g. the functions described above) with instances of the custom x86-asm-instruction class (still a very simple class, about 20 slots with :initarg :reader :initform , etc. e ..), in which the emit method with arguments will be used to emit binary code for the given command (mnemonics) and arguments. The converted command data looks like this (but it's more than 40,000 lines and exactly 7193 make-instance and 7193 setf ).

 ;;  first mnemonic + operand combination instances (: is-variant t).
 ;;  there are 4928 such instances for x86-64 generated from NASM insns.dat.

 (eval-when (: compile-toplevel: load-toplevel: execute)

 (setf Jcc-imm-near (make-instance 'x86-asm-instruction
 : name "Jcc"
 : operands "imm | near"
 : code-string "[i: odf 0f 80 + c rel]"
 : arch-flags (list "386" "BND")
 : is-variant t))

 (setf STOSB-void (make-instance 'x86-asm-instruction
 : name "STOSB"
 : operands "void"
 : code-string "[aa]"
 : arch-flags (list "8086")
 : is-variant t))

 ;;  then, container instances which contain (or could be refer to instead)
 ;;  the possible options of each instruction.
 ;;  there are 2265 such instances for x86-64 generated from NASM insns.dat.

 (setf Jcc (make-instance 'x86-asm-instruction
                          : name "Jcc"
                          : is-container t
                          : options (list Jcc-imm-near
                                          Jcc-imm64-near
                                          Jcc-imm-short
                                          Jcc-imm
                                          Jcc-imm
                                          Jcc-imm
                                          Jcc-imm)))

 (setf STOSB (make-instance 'x86-asm-instruction
                            : name "STOSB"
                            : is-container t
                            : options (list STOSB-void)))

 ;;  thousands of objects more here ...

 );  this bracket closes (eval-when (: compile-toplevel: load-toplevel: execute)

I converted NASM insns.dat to Common Lisp syntax (e.g. above) using the trivial Perl script (further below, but nothing interesting in the script) and it basically does not work. So this works, but compiling these 7193 objects is really very slow and usually causes heap depletion. On my Linux Core i7-2760QM laptop with 16 GB of memory, compiling code code (eval-when (:compile-toplevel :load-toplevel :execute) with 7193 objects like above takes more than 7 minutes and sometimes causes heap exhaustion , eg:

  ;;  Swank started at port: 4005.
 * Heap exhausted during garbage collection: 0 bytes available, 32 requested.
  Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB! Move Alloc Waste Trig WP GCs Mem-age
    0: 0 0 0 0 0 0 0 0 0 0 0 0 4,194,340 0 0 0.0000
    1: 0 0 0 0 0 0 0 0 0 0 0 41943 040 0 0 0.0000
    2: 0 0 0 0 0 0 0 0 0 0 0 0 41943 040 0 0 0.0000
    3: 38805 38652 0 0 49474 15433 389 416 0 2144219760 9031056 1442579856 0 1 1.5255
    4: 127998 127996 0 0 45870 14828 106 143 199 1971682720 25428576 2,000,000 0 0 0.0000
    5: 0 0 0 0 0 0 0 0 0 0 0 0 2,000,000 0 0 0.0000
    6: 0 0 0 0 1178 163 0 0 0 43941888 0 2,000,000 985 0 0.0000
    Total bytes allocated = 4159844368
    Dynamic-space-size bytes = 4194304000
 GC control variables:
    * GC-INHIBIT * = true
    * GC-PENDING * = in progress
    * STOP-FOR-GC-PENDING * = false
 fatal error encountered in SBCL pid 9994 (tid 46912556431104):
 Heap exhausted, game over.

 Welcome to LDB, a low-level debugger for the Lisp runtime environment.
 ldb>

I had to add the --dynamic-space-size 4000 option for SBCL to compile it at all, but still after allocating 4 gigabytes of dynamic heap of space, it sometimes --dynamic-space-size 4000 out. Even if the heap exhaustion were resolved, more than 7 minutes to compile 7193 instances after adding only the slot in the class (the class 'x86-asm-instruction used for these instances) is too much for interactive development in REPL (I use slimv if it has value).

Here (time (compile-file output:

  ;  caught 18636 WARNING conditions


 ;  insns.fasl written
 ;  compilation finished in 0: 07: 11.329
 Evaluation took:
   431.329 seconds of real time
   238.317000 seconds of total run time (234.972000 user, 3.345000 system)
   [Run times consist of 6.073 seconds GC time, and 232.244 seconds non-GC time.  ]
   55.25% CPU
   50,367 forms interpreted
   784,044 lambdas converted
   1,031,842,900,608 processor cycles
   19,402,921,376 bytes consed

Using OOP (CLOS) will enable command mnemonics (for example, jc or stosb above stosb :name )), allowed operands of a command ( :operands ), binary encoding of commands (for example, #xaa for stosb :code-string ) and possible architectural restrictions ( :arch-flags ) commands in one object. But it seems that at least my three-year-old computer is not efficient enough to quickly compile about 7000 instances of CLOS objects.

My question is: is there a way to make SBCL make-instance faster, or should I continue to generate assembly code in normal functions, like in the examples above? I would also be very happy to know about any other possible solutions.

Here's a Perl script, just in case:

 #!/usr/bin/env perl use strict; use warnings; # this program converts NASM `insns.dat` to Common Lisp Object System (CLOS) syntax. my $firstchar; my $line_length; my $are_there_square_brackets; my $mnemonic_and_operands; my $mnemonic; my $operands; my $code_string; my $flags; my $mnemonic_of_current_mnemonic_array; my $clos_object_name; my $clos_mnemonic; my $clos_operands; my $clos_code_string; my $clos_flags; my @object_name_array = (); my @mnemonic_array = (); my @operands_array = (); my @code_string_array = (); my @flags_array = (); my @each_mnemonic_only_once_array = (); my @instruction_variants_array = (); my @instruction_variants_for_current_instruction_array = (); open(FILE, 'insns.dat'); $mnemonic_of_current_mnemonic_array = ""; # read one line at once. while (<FILE>) { $firstchar = substr($_, 0, 1); $line_length = length($_); $are_there_square_brackets = ($_ =~ /\[.*\]/); chomp; if (($line_length > 1) && ($firstchar =~ /[^\t ;]/)) { if ($are_there_square_brackets) { ($mnemonic_and_operands, $code_string, $flags) = split /[\[\]]+/, $_; $code_string = "[" . $code_string . "]"; ($mnemonic, $operands) = split /[\t ]+/, $mnemonic_and_operands; } else { ($mnemonic, $operands, $code_string, $flags) = split /[\t ]+/, $_; } $mnemonic =~ s/[\t ]+/ /g; $operands =~ s/[\t ]+/ /g; $code_string =~ s/[\t ]+/ /g; $flags =~ s/[\t ]+//g; # we don't want non-x86-64 instructions here. unless ($flags =~ "NOLONG") { # ok, the content of each field is now filtered, # let convert them to a suitable Common Lisp format. $clos_object_name = $mnemonic . "-" . $operands; # in Common Lisp object names `|`, `,`, and `:` must be escaped with a backslash `\`, # but that would get too complicated. # so we'll simply replace them: # `|` -> `-`. # `,` -> `.`. # `:` -> `.`. $clos_object_name =~ s/\|/-/g; $clos_object_name =~ s/,/./g; $clos_object_name =~ s/:/./g; $clos_mnemonic = "\"" . $mnemonic . "\""; $clos_operands = "\"" . $operands . "\""; $clos_code_string = "\"" . $code_string . "\""; $clos_flags = "\"" . $flags . "\""; # add first and last double quotes. $clos_flags =~ s/,/" "/g; # make each flag its own Common Lisp string. $clos_flags = "(list " . $clos_flags. ")"; # convert to `list` syntax. push @object_name_array, $clos_object_name; push @mnemonic_array, $clos_mnemonic; push @operands_array, $clos_operands; push @code_string_array, $clos_code_string; push @flags_array, $clos_flags; if ($mnemonic eq $mnemonic_of_current_mnemonic_array) { # ok, same mnemonic as the previous one, # so the current object name goes to the list. push @instruction_variants_for_current_instruction_array, $clos_object_name; } else { # ok, this is a new mnemonic. # so we'll mark this as current mnemonic. $mnemonic_of_current_mnemonic_array = $mnemonic; push @each_mnemonic_only_once_array, $mnemonic; # we first push the old array (unless it empty), then clear it, # and then push the current object name to the cleared array. if (@instruction_variants_for_current_instruction_array) { # push the variants array, unless it empty. push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ]; } @instruction_variants_for_current_instruction_array = (); push @instruction_variants_for_current_instruction_array, $clos_object_name; } } } } # the last instruction instruction variants must be pushed too. if (@instruction_variants_for_current_instruction_array) { # push the variants array, unless it empty. push @instruction_variants_array, [ @instruction_variants_for_current_instruction_array ]; } close(FILE); # these objects need be created already during compilation. printf("(eval-when (:compile-toplevel :load-toplevel :execute)\n"); # print the code to create each instruction + operands combination object. for (my $i=0; $i <= $#mnemonic_array; $i++) { $clos_object_name = $object_name_array[$i]; $mnemonic = $mnemonic_array[$i]; $operands = $operands_array[$i]; $code_string = $code_string_array[$i]; $flags = $flags_array[$i]; # print the code to create a variant object. # each object here is a variant of a single instruction (or a single mnemonic). # actually printed as 6 lines to make it easier to read (for us humans, I mean), with an empty line in the end. printf("(setf %s (make-instance 'x86-asm-instruction\n:name %s\n:operands %s\n:code-string %s\n:arch-flags %s\n:is-variant t))", $clos_object_name, $mnemonic, $operands, $code_string, $flags); printf("\n\n"); } # print the code to create each instruction + operands combination object. # for (my $i=0; $i <= $#each_mnemonic_only_once_array; $i++) for my $i (0 .. $#instruction_variants_array) { $mnemonic = $each_mnemonic_only_once_array[$i]; # print the code to create a container object. printf("(setf %s (make-instance 'x86-asm-instruction :name \"%s\" :is-container t :variants (list \n", $mnemonic, $mnemonic); @instruction_variants_for_current_instruction_array = $instruction_variants_array[$i]; # for (my $j=0; $j <= $#instruction_variants_for_current_instruction_array; $j++) for my $j (0 .. $#{$instruction_variants_array[$i]} ) { printf("%s", $instruction_variants_array[$i][$j]); # print 3 closing brackets if this is the last variant. if ($j == $#{$instruction_variants_array[$i]}) { printf(")))"); } else { printf(" "); } } # if this is not the last instruction, print two newlines. if ($i < $#instruction_variants_array) { printf("\n\n"); } } # print the closing bracket to close `eval-when`. print(")"); exit; 
+7
performance lisp common-lisp sbcl clos
source share
1 answer

18636 warnings look very bad, start by removing all warnings.

I would start getting rid of EVAL-WHEN around all this. It makes no sense to me. Either upload the file directly, or copy and upload the file.

Also note that SBCL does not like (setf STOSB-void ...) when the variable is undefined. New top-level variables are entered using DEFVAR or DEFPARAMETER . SETF just sets them, but does not define them. This should help get rid of warnings.

Also :is-container t and :is-variant t smell how these properties should be converted to classes for inheritance (for example, like mixin). The container has options. Option has no options.

+10
source share

All Articles