Difference between revisions of "Tutorials/Assembler Tutorial"
Line 1: | Line 1: | ||
− | Everything that is executed on a computer is executed in machine language. If you develop software in php, this software will be interpretreted by php to run. The interpreter is available in machine language. If you write software in C, the C compiler will translate your source code into machine language, a process known as compiling. Machine language is the godfather of programming languages and assembler is there to translate machine language into mnemonics, where one mnemonic stands for one command in machine language. You see this is very low-level and I like low-level topics. So here I show you how I deal with machine language and assembler. I am using Linux in the examples. | + | Everything that is executed on a computer is executed in machine language. If you develop software in php, this software will be interpretreted by php to run. The interpreter is available in machine language. If you write software in C, the C compiler will translate your source code into machine language, a process known as compiling. Machine language is the godfather of programming languages and assembler is there to translate machine language into mnemonics, where one mnemonic stands for one command in machine language. You see this is very low-level and I like low-level topics. So here I show you how I deal with machine language and assembler. I am using x86 Linux in the examples. |
= Endless loop = | = Endless loop = | ||
− | + | A "hello world" program in assembler is already advanced. So as a first lesson we will take a look at a program that does nothing but an endless loop. Here is it: | |
'''endless.asm''' | '''endless.asm''' | ||
Line 11: | Line 11: | ||
jmp _start | jmp _start | ||
</pre> | </pre> | ||
+ | This assembler source code contains two commands, "nop" for "no operation" and "jmp" for "jump". The other two lines is a label (_start:) and meta-information (global _start saying that "start" is where the program starts). | ||
+ | |||
'''compile it''' | '''compile it''' | ||
nasm -f elf64 endless.asm | nasm -f elf64 endless.asm |
Revision as of 19:00, 27 November 2014
Everything that is executed on a computer is executed in machine language. If you develop software in php, this software will be interpretreted by php to run. The interpreter is available in machine language. If you write software in C, the C compiler will translate your source code into machine language, a process known as compiling. Machine language is the godfather of programming languages and assembler is there to translate machine language into mnemonics, where one mnemonic stands for one command in machine language. You see this is very low-level and I like low-level topics. So here I show you how I deal with machine language and assembler. I am using x86 Linux in the examples.
Contents |
Endless loop
A "hello world" program in assembler is already advanced. So as a first lesson we will take a look at a program that does nothing but an endless loop. Here is it:
endless.asm
global _start _start: nop jmp _start
This assembler source code contains two commands, "nop" for "no operation" and "jmp" for "jump". The other two lines is a label (_start:) and meta-information (global _start saying that "start" is where the program starts).
compile it
nasm -f elf64 endless.asm
link it
ld -s -o endless endless.o
call it
./endless
Hello world
We now create a hello world program and disassemble it:
cat hello.c #include <stdio.h> int main() { int i=0x23; printf("hello world"); } gcc hello.c -o hello ./hello hello world objdump -d hello
We see a lot of output here. We are interested in the main routine:
000000000040053c <main>: 40053c: 55 push %rbp 40053d: 48 89 e5 mov %rsp,%rbp 400540: 48 83 ec 20 sub $0x20,%rsp 400544: c7 45 fc 23 00 00 00 movl $0x23,-0x4(%rbp) 40054b: bf 4c 06 40 00 mov $0x40064c,%edi 400550: b8 00 00 00 00 mov $0x0,%eax 400555: e8 d6 fe ff ff callq 400430 <printf@plt>
You may wonder what the command callq means. You cannot use it in nasm. Well, callq is gcc's mnemonic for nasm's "call" mnemonic. We better tell objdump to present us with a sensible syntax:
objdump -M intel -d hello
And the result for the main section is:
000000000040053c <main>: 40053c: 55 push rbp 40053d: 48 89 e5 mov rbp,rsp 400540: 48 83 ec 20 sub rsp,0x20 400544: c7 45 fc 23 00 00 00 mov DWORD PTR [rbp-0x4],0x23 40054b: bf 4c 06 40 00 mov edi,0x40064c 400550: b8 00 00 00 00 mov eax,0x0 400555: e8 d6 fe ff ff call 400430 <printf@plt> 40055a: c9 leave 40055b: c3 ret
GCC assembler
To learn the syntax of a gcc assembler program, let's write a C program and compile it without assembling it. Here is the C program, hello.c:
#include <stdio.h> int main() { int i=0x23; printf("hello world"); }
Now we compile this without assembling it:
# gcc -o hello.asm -S hello.c
Now we have the program transformed to assembler and take a look at it:
# cat hello.asm .file "hello.c" .section .rodata .LC0: .string "hello world" .text .globl main .type main, @function main: .LFB2: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: subq $32, %rsp .LCFI2: movl $35, -4(%rbp) movl $.LC0, %edi movl $0, %eax call printf [...]
Now we know the syntax of gcc assembler and we can finally write a program that consists of an endless loop:
.text .globl main main: start: nop; jmp start