Just to be thorough: the first thing to do is collect the profile data, and the second is to consider your algorithms. I am sure you know this, but they should be included in any discussion of performance.
Directly about your question "Can you switch to ASM help?" answer: "If you do not know the answer to this question, then probably not." If you are not familiar with the processor architecture and its functions, it is unlikely that you will do much better work than a good optimizing C / C ++ compiler for your code.
, ( ) parallelism, . 4 8 , , . C/++, OpenMP ; (, , " parallelism, " ).
, C/++. Intel ++ OpenMP, , Threading Building.
, , ++, " , / parallelism ". "" , , , , , , .