Difference between revisions of "Tutorials/Assembler Tutorial"
(→Endless loop) |
(→run vlc as root) |
||
Line 188: | Line 188: | ||
= run vlc as root = | = run vlc as root = | ||
− | Knowing assembler can help you in situations where you need to do variations to already compiled programs. For example when you start vlc as root you will get an error message | + | Knowing assembler can help you in situations where you need to do variations to already compiled programs. For example when you start vlc as root you will get an error message |
VLC is not supposed to be run as root. Sorry. | VLC is not supposed to be run as root. Sorry. | ||
If you need to use real-time priorities and/or privileged TCP ports | If you need to use real-time priorities and/or privileged TCP ports | ||
Line 205: | Line 205: | ||
As you can see, the program calls the syscall geteuid. The return value is stored in register AX. Then AX is compared against 0 (test eax,eax). If it is 0, the "equal" flag in the processor is set. The next instruction is je ("jump if the equal flag is set"), a conditional jump. The solution is to replace the call to geteuid by a command to set AX to another value but 0, for example | As you can see, the program calls the syscall geteuid. The return value is stored in register AX. Then AX is compared against 0 (test eax,eax). If it is 0, the "equal" flag in the processor is set. The next instruction is je ("jump if the equal flag is set"), a conditional jump. The solution is to replace the call to geteuid by a command to set AX to another value but 0, for example | ||
b8 00 00 00 01 | b8 00 00 00 01 | ||
− | and then vlc will always run as if | + | and then vlc will always run as if your user id was 1. |
= See also = | = See also = |
Revision as of 09:07, 3 December 2014
Machine language is everywhere. Whether you are playing Call of Duty, surf in the internet or write a document - it is machine language that is being executed inside your computer. No matter if you wrote your software in C, BASIC or Ruby, at execution time it has been translated to machine language. Machine language is the godfather of programming languages. Its commands are binary, for example "wait" is 90h for x86. This is why assembler exists - it maps the machine language commands to mnemonics like "jmp" for "jump" or "nop" for "wait". You see this is very low-level and I like low-level topics. So here I show you how I deal with machine language and assembler. I am using x86 Linux in the examples.
Contents |
Endless loop
A "hello world" program in assembler is already advanced. So as a first lesson we will take a look at a program that does nothing but an endless loop. Here is it:
endless.asm
global _start _start: nop jmp _start
This assembler source code contains two commands, "nop" for "no operation" and "jmp" for "jump". The other two lines is a label (_start:) and meta-information (global _start saying that "start" is where the program starts).
compile it
nasm -f elf64 endless.asm
link it
ld -s -o endless endless.o
call it
./endless
Now you will need to press CTRL_C to stop the program. Note that this is possible because there is an operating system giving time slices to the process and the operating system is watching for keypresses still.
disassemble it
Now we want to take a look at the machine language in this program right? Here it is:
# objdump -M intel -d endless endless: file format elf64-x86-64 Disassembly of section .text: 0000000000400080 <.text>: 400080: 90 nop 400081: eb fd jmp 0x400080
So, the byte "90" (hexadecimal) is machine code for "do nothing" or "wait" or "no operation", its mnemonic is "nop". "eb" is "jump" or "jmp", its parameter is where to jump. It is a relative jump, jumping to ff would mean jumping to the same byte, so to the parameter of jump. "jmp fe" means "jump to the jump command" and "jmp fd" means "jump to the byte before the jump command".
Hello world
We now create a hello world program in C. Then we compile and disassemble it. So we have the C compiler translate it into machine language and then we use a disassembler to translate it into assembler. This is the program:
hello.c
#include <stdio.h> int main() { int i=0x23; printf("hello world"); }
Now we compile it:
gcc hello.c -o hello
and see that it runs:
./hello hello world
To disassemble it, say
objdump -M intel -d hello
And the result for the main section is:
000000000040053c <main>: 40053c: 55 push rbp 40053d: 48 89 e5 mov rbp,rsp 400540: 48 83 ec 20 sub rsp,0x20 400544: c7 45 fc 23 00 00 00 mov DWORD PTR [rbp-0x4],0x23 40054b: bf 4c 06 40 00 mov edi,0x40064c 400550: b8 00 00 00 00 mov eax,0x0 400555: e8 d6 fe ff ff call 400430 <printf@plt> 40055a: c9 leave 40055b: c3 ret
To understand this you should know that every processor has a set of registers. eax, edi, rbp and rsp are such registers. The "push rbp" command is only one byte, 55 hexadecimal and means that the processor will take its register rbp and store it in memory so it can always be restored using the pop command. The "mov" command stands for "move" and says that one register's value is moved into another register, or a value is moved into a register, or a value is moved into ram. Note that this command ("mov") translates - depending on its exact meaning to quite some different bytes in machine language, in the above example b8, bf, c7 and 48 89. b8 requires 4 bytes as parameters, 48 89 only one. sub stands for "subtract", ret stands for "return". It will end the program and return to the calling program which is the operating system. "call" will do exactly this - call a library function that is in memory, in this case it will call printf. The actual "hello world" string is stored not in the <main> section but in the data section. Note that the "text" section is the "code" section; it is the section that will be executed:
tweedleburg:~ # strings hello /lib64/ld-linux-x86-64.so.2 libc.so.6 printf __libc_start_main __gmon_start__ GLIBC_2.2.5 UH-@ UH-@ []A\A]A^A_ hello world ;*3$"
translate C to assembler
To learn the syntax of a gcc assembler program, let's write a C program and compile it without assembling it. Here is the C program, hello.c:
#include <stdio.h> int main() { int i=0x23; printf("hello world"); }
Now we compile this without assembling it:
# gcc -o hello.asm -S hello.c
Now we have the program transformed to assembler and take a look at it:
# cat hello.asm .file "hello.c" .section .rodata .LC0: .string "hello world" .text .globl main .type main, @function main: .LFB2: pushq %rbp .LCFI0: movq %rsp, %rbp .LCFI1: subq $32, %rsp .LCFI2: movl $35, -4(%rbp) movl $.LC0, %edi movl $0, %eax call printf [...]
Now we know the syntax of gcc assembler and we can finally write a program that consists of an endless loop:
.text .globl main main: start: nop; jmp start
Create a boot sector
Under program your own OS I show how to create a boot sector for your own operating system. The challenge here is that the executable code must not be longer than 512 bytes. Here is how we do it:
- create a file hello.s
hello.s
start: ; this should print H mov ax, 0xe48 mov bx, 7 int 0x10 ; E mov ax, 0xe45 int 0x10 ; L mov ax, 0xe4C int 0x10 ; L mov ax, 0xe4C int 0x10 ; O mov ax, 0xe4F int 0x10 .ende jmp .ende
You may note that we say here "mov ax,..." while in the previous example we have seen "mov eax,...". The reason is that there are so many assembler dialects.
- translate this assembler code into machine language:
nasm kernel.s
- the result is the file kernel. Let's look at it:
tweedleburg:~ # ll kernel -rw-r--r-- 1 root root 30 Nov 27 21:29 kernel tweedleburg:~ # hexdump -C kernel 00000000 b8 48 0e bb 07 00 cd 10 b8 45 0e cd 10 b8 4c 0e |.H.......E....L.| 00000010 cd 10 b8 4c 0e cd 10 b8 4f 0e cd 10 eb fe |...L....O.....| 0000001e tweedleburg:~ #
You see the mov ax (or mov eax) assembler command is again translated to b8 as a byte in machine language. You see all assembler commands are translated and there is nothing but machine language in that file. If you want to use this, see programming your own OS.
x86 assembler
In all of this article we are talking about Intel x86 assembler. Only here I want to give you an example for ARM assembler that I got from http://linuxintro.org/wiki/objDump:
Nokia-N810-43-7:~# objdump -d a.out | head a.out: file format elf32-littlearm Disassembly of section .init: 000084f8 <_init>: 84f8: e52de004 str lr, [sp, #-4]! 84fc: e24dd004 sub sp, sp, #4 ; 0x4 8500: eb000035 bl 85dc <call_gmon_start> 8504: e28dd004 add sp, sp, #4 ; 0x4
The main difference is, as you can see, that this machine language is has a fixed-width command set - every command consists of 4 bytes. That makes it easier to jump to the command that is one, two or whatever commands apart. While the machine language is completely different, there are many assembler mnemonics that also exist in x86 assembler.
run vlc as root
Knowing assembler can help you in situations where you need to do variations to already compiled programs. For example when you start vlc as root you will get an error message
VLC is not supposed to be run as root. Sorry. If you need to use real-time priorities and/or privileged TCP ports you can use vlc-wrapper (make sure it is Set-UID root and cannot be run by non-trusted users first).
If you - like me - do not see why root should not be allowed to run vlc, you can disassemble the code using objdump -d
objdump -d -M intel /usr/bin/vlc [...] 4010f9: e8 32 0a 00 00 call 401b30 <unsetenv> 4010fe: e8 3d fe ff ff call 400f40 <geteuid@plt> 401103: 85 c0 test eax,eax 401105: 0f 84 04 06 00 00 je 40170f <fflush@plt+0x66f> 40110b: be ca 1f 40 00 mov esi,0x401fca 401110: bf 06 00 00 00 mov edi,0x6 [...]
As you can see, the program calls the syscall geteuid. The return value is stored in register AX. Then AX is compared against 0 (test eax,eax). If it is 0, the "equal" flag in the processor is set. The next instruction is je ("jump if the equal flag is set"), a conditional jump. The solution is to replace the call to geteuid by a command to set AX to another value but 0, for example
b8 00 00 00 01
and then vlc will always run as if your user id was 1.