Tutorials/Assembler Tutorial

From ThorstensHome
Jump to: navigation, search

Everything that is executed on a computer is executed in machine language. If you develop software in php, this software will be interpretreted by php to run. The interpreter is available in machine language. If you write software in C, the C compiler will translate your source code into machine language, a process known as compiling. Machine language is the godfather of programming languages and assembler is there to translate machine language into mnemonics, where one mnemonic stands for one command in machine language. You see this is very low-level and I like low-level topics. So here I show you how I deal with machine language and assembler. I am using x86 Linux in the examples.

Contents

Endless loop

A "hello world" program in assembler is already advanced. So as a first lesson we will take a look at a program that does nothing but an endless loop. Here is it:

endless.asm

global _start
_start:
   nop
jmp _start

This assembler source code contains two commands, "nop" for "no operation" and "jmp" for "jump". The other two lines is a label (_start:) and meta-information (global _start saying that "start" is where the program starts).

compile it

nasm -f elf64 endless.asm

link it

ld -s -o endless endless.o

call it

./endless

Hello world

We now create a hello world program and disassemble it:

cat hello.c
#include <stdio.h> 

int main()
{
  int i=0x23;
  printf("hello world");
}
gcc hello.c -o hello
./hello
hello world
objdump -d hello

We see a lot of output here. We are interested in the main routine:

000000000040053c <main>:
 40053c:       55                      push   %rbp
 40053d:       48 89 e5                mov    %rsp,%rbp
 400540:       48 83 ec 20             sub    $0x20,%rsp
 400544:       c7 45 fc 23 00 00 00    movl   $0x23,-0x4(%rbp)
 40054b:       bf 4c 06 40 00          mov    $0x40064c,%edi  
 400550:       b8 00 00 00 00          mov    $0x0,%eax       
 400555:       e8 d6 fe ff ff          callq  400430 <printf@plt>

You may wonder what the command callq means. You cannot use it in nasm. Well, callq is gcc's mnemonic for nasm's "call" mnemonic. We better tell objdump to present us with a sensible syntax:

objdump -M intel -d hello

And the result for the main section is:

000000000040053c <main>:
 40053c:       55                      push   rbp
 40053d:       48 89 e5                mov    rbp,rsp
 400540:       48 83 ec 20             sub    rsp,0x20
 400544:       c7 45 fc 23 00 00 00    mov    DWORD PTR [rbp-0x4],0x23
 40054b:       bf 4c 06 40 00          mov    edi,0x40064c            
 400550:       b8 00 00 00 00          mov    eax,0x0                 
 400555:       e8 d6 fe ff ff          call   400430 <printf@plt>     
 40055a:       c9                      leave                          
 40055b:       c3                      ret

GCC assembler

To learn the syntax of a gcc assembler program, let's write a C program and compile it without assembling it. Here is the C program, hello.c:

#include <stdio.h>

int main()
{
  int i=0x23;
  printf("hello world");
}

Now we compile this without assembling it:

# gcc -o hello.asm -S hello.c

Now we have the program transformed to assembler and take a look at it:

# cat hello.asm              
        .file   "hello.c"                        
        .section        .rodata                  
.LC0:                                            
        .string "hello world"                    
        .text                                    
.globl main                                      
        .type   main, @function                  
main:                                            
.LFB2:                                           
        pushq   %rbp                             
.LCFI0:                                          
        movq    %rsp, %rbp                       
.LCFI1:                                          
        subq    $32, %rsp                        
.LCFI2:                                          
        movl    $35, -4(%rbp)                    
        movl    $.LC0, %edi                      
        movl    $0, %eax                         
        call    printf  
[...]

Now we know the syntax of gcc assembler and we can finally write a program that consists of an endless loop:

.text
.globl main
main:
start:
  nop;
  jmp start

See also