Tutorials/Assembler Tutorial

From ThorstensHome
Revision as of 22:18, 27 November 2014 by ThorstenStaerk (Talk | contribs)

Jump to: navigation, search

Everything that is executed on a computer is executed in machine language. If you develop software in php, this software will be interpretreted by php to run. The interpreter is available in machine language. If you write software in C, the C compiler will translate your source code into machine language, a process known as compiling. Machine language is the godfather of programming languages and assembler is there to translate machine language into mnemonics, where one mnemonic stands for one command in machine language. You see this is very low-level and I like low-level topics. So here I show you how I deal with machine language and assembler. I am using x86 Linux in the examples.

Contents

Endless loop

A "hello world" program in assembler is already advanced. So as a first lesson we will take a look at a program that does nothing but an endless loop. Here is it:

endless.asm

global _start
_start:
   nop
jmp _start

This assembler source code contains two commands, "nop" for "no operation" and "jmp" for "jump". The other two lines is a label (_start:) and meta-information (global _start saying that "start" is where the program starts).

compile it

nasm -f elf64 endless.asm

link it

ld -s -o endless endless.o

call it

./endless

Hello world

We now create a hello world program in C. Then we compile and disassemble it. So we have the C compiler translate it into machine language and then we use a disassembler to translate it into assembler. This is the program:

hello.c

#include <stdio.h> 

int main()
{
  int i=0x23;
  printf("hello world");
}

Now we compile it:

gcc hello.c -o hello

and see that it runs:

./hello
hello world

To disassemble it, say

objdump -M intel -d hello

And the result for the main section is:

000000000040053c <main>:
 40053c:       55                      push   rbp
 40053d:       48 89 e5                mov    rbp,rsp
 400540:       48 83 ec 20             sub    rsp,0x20
 400544:       c7 45 fc 23 00 00 00    mov    DWORD PTR [rbp-0x4],0x23
 40054b:       bf 4c 06 40 00          mov    edi,0x40064c            
 400550:       b8 00 00 00 00          mov    eax,0x0                 
 400555:       e8 d6 fe ff ff          call   400430 <printf@plt>     
 40055a:       c9                      leave                          
 40055b:       c3                      ret

To understand this you should know that every processor has a set of registers. eax, edi, rbp and rsp are such registers. The "push rbp" command is only one byte, 55 hexadecimal and means that the processor will take its register rbp and store it in memory so it can always be restored using the pop command. The "mov" command stands for "move" and says that one register's value is moved into another register, or a value is moved into a register, or a value is moved into ram. Note that this command ("mov") translates - depending on its exact meaning to quite some different bytes in machine language, in the above example b8, bf, c7 and 48 89. b8 requires 4 bytes as parameters, 48 89 only one. sub stands for "subtract", ret stands for "return". It will end the program and return to the calling program which is the operating system. "call" will do exactly this - call a library function that is in memory, in this case it will call printf. The actual "hello world" string is stored not in the <main> section but in the data section. Note that the "text" section is the "code" section; it is the section that will be executed:

tweedleburg:~ # strings hello
/lib64/ld-linux-x86-64.so.2
libc.so.6
printf
__libc_start_main
__gmon_start__
GLIBC_2.2.5
UH-@
UH-@
[]A\A]A^A_
hello world
;*3$"

GCC assembler

To learn the syntax of a gcc assembler program, let's write a C program and compile it without assembling it. Here is the C program, hello.c:

#include <stdio.h>

int main()
{
  int i=0x23;
  printf("hello world");
}

Now we compile this without assembling it:

# gcc -o hello.asm -S hello.c

Now we have the program transformed to assembler and take a look at it:

# cat hello.asm              
        .file   "hello.c"                        
        .section        .rodata                  
.LC0:                                            
        .string "hello world"                    
        .text                                    
.globl main                                      
        .type   main, @function                  
main:                                            
.LFB2:                                           
        pushq   %rbp                             
.LCFI0:                                          
        movq    %rsp, %rbp                       
.LCFI1:                                          
        subq    $32, %rsp                        
.LCFI2:                                          
        movl    $35, -4(%rbp)                    
        movl    $.LC0, %edi                      
        movl    $0, %eax                         
        call    printf  
[...]

Now we know the syntax of gcc assembler and we can finally write a program that consists of an endless loop:

.text
.globl main
main:
start:
  nop;
  jmp start

See also