Admittedly, this is a personal bias as you prefer to study programming.
But with regard to assembly languages, in particular, I found an approach that was more useful to me than reading reference manuals for a set of instructions and / or books in assembly language (where they exist).
What I usually do to understand how the assembly works for a new CPU / CPU unknown to me on an OS platform that I haven't worked with yet is to use the developer's toolchain. For example:
set yourself (cross-compiler) and a disassembler for the target CPU. These days, the ubiquitous nature of GNU gcc / binutils often means that gcc and objdump -d .
create a bunch of small programs / small pieces of source code, for example:
extern int funcA(int arg); extern int funcB(int arg1, int arg2); extern int funcC(int arg1, int arg2, int arg3); extern int funcD(int arg1, int arg2, int arg3, int arg4); extern int funcE(int arg1, int arg2, int arg3, int arg4); extern int funcF(int arg1, int arg2, int arg3, int arg4, int arg5); extern int funcG(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6); extern int funcH(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7); int main(int argc, char **argv) { printf("sum of all funcs: %d\n", funcA(1) + funcB(2, 3) + funcC(4, 5, 6) + funcD(7, 8, 9, 10) + funcE(11, 12, 13, 14, 15) + funcF(16, 17, 18, 19, 20, 21) + funcG(22, 23, 24, 25, 26, 27, 28) + funcH(29, 30, 31, 32, 33, 34, 35)); return 12345; }
compile them with compiler optimization and break the generated object code.
The code structure is simple enough to demonstrate how ABI wrt works. to call functions, pass arguments and return values, control the register space by. to which registers are saved / mutable when making function calls. It will also show you basic assembly code for initializing persistent data and glue, such as accessing and managing the stack.
extend this for simple C language constructs such as loops and if / else or switch . Always save some calls to external undefined functions, because this will prevent the compiler optimizer from starting all your “test code”, and when you use if() switch() tests, the predicate on argc (or other function arguments), because the compiler cannot predict this (and therefore, optimizing the building blocks of the code is "weird").
extend this to use the definitions of struct {} and class {} containing sequences of different primitive data types to find out how the compiler arranges them in memory, which build commands are used to access bytes / words / ints / longs / floats, etc. .
You can intentionally modify all these fragments of test code (for example, use different operations than + ), and / or become more complicated to learn more about certain parts of the instruction set and ABI.
After you have done this, and look at the result, find a copy (electronic or not) of the ABI platform. It contains a set of rules on how this is done and why it is done this way, and it will help you understand why these rules are applicable to a particular platform. It is very important to get an idea of this above, because when you write your own assembly code, you will have to interact with other non-assemblies (if only for pure demos). This is where you need to play by the rules, so even if you don’t know them by heart, at least you know where the rule is.
Only after that I suggest you actually track the link to a set of commands for a specific platform.
This is because when you went through the above, then you already have enough experience / you have already seen enough to start with a small C program, compile it to the source of the assembly, modify it a bit, assemble and link it and see if your modification does what he should do.
Trying at this stage to use some more unusual / specialized instructions will be much easier, because you have already seen how the function call works, what glue code is needed for your assembly to interact with other parts of the program, you have already used the tool chain, so you no longer need start from scratch.
Ie, to sum it all up, my suggestion is to study the assembly from top to bottom, not bottom to top.
Side note:
Why do I propose to use compiler optimization when analyzing the assembly code generated by the compiler for such simple examples?
Well, the answer to this question is that, counterintuitively for some, the generated assembler code is much simpler if you allow the compiler to optimize the hell out of things. Without optimization, compilers often create “dumb” code, for example. pushes all the variables on the stack, saves and restores them from there, for no reason you can see, registers saves / restores / initializes just to overwrite this rule of the very next instruction and much more. Because of this, the amount of emitted code is much larger. It filled with a crack and is much more difficult to understand. Optimization of the compiler forces you to trim this cool to the essential that you want to see in order to understand the ABI platform and assemnbly. Therefore, use compiler optimization.