“I’m the operator with my pocket calculator.” -Kraftwerk
There has been much interest in assembly lately (whether the real 6502, or the fictional DCPU–16; I even created my own virtual 8-bit CPU called i808 in 2007), but none of this attention focuses on the architecture that is most popular in today’s computers. If you are reading this on a desktop, laptop, or server then your computer is most likely using x86–64 (or x86). x86–64 is the 64-bit superset of the 32-bit x86 architecture and any modern CPU from AMD or Intel supports it. This document will focus on the most used parts of x86–64.
Assembly language is the lowest level of abstraction in computers – the point at which the code is still readable. Assembly language translates directly to the bytes that are executed by your computer’s processor.
Learning assembly is a useful exercise and will give you a deeper understanding of what takes place ‘under the hood’. While the vast majority of programming is done via high-level languages such as C, C++, Java, etc., it is sometimes advantageous to write partial segments of code in assembly if execution speed is a high priority. For instance, code segments with heavy math calcualtions for 3D games or scientific processes stand to benefit signifcantly from the speedup that can be achieved with assembly.
In this document we will be using ‘Intel’ syntax instead of ‘AT&T’. Therefore, opcodes that use multiple arguments work in the following form:
opcode destination, source
Any numbers with the prefix ‘0x’ in x86–64 assembly language (and by extension, in this document) are in hexadecimal (hex) format. If you’re not familiar with hex numbers, I recommend you read the Wikipedia article before beginning.
Registers are probably the most complicated part of the x86–64 architecture and the complications that arise from them are mainly due to the carry-over from the legacy 32-bit and 16-bit x86 architectures. x86–64 has 16 64-bit general purpose registers named R0 - R15. These registers can be broken down into separate parts by bit size and can also be referenced by their legacy x86 names. More information on register names and breakdowns can be found here.
For instance, R0 is a 64-bit register (also known as a quad word). If you only want to use 32 bits, then that section can be referenced by R0D (a double word), 16 bits by R0W (a word), or 8 bits by R0B (a byte).
These D, W, and B refereces are examples of carry-over from the 16-bit word days:
8 bits = 1 byte or ‘halfword’
16 bits = 2 bytes = 1 word
32 bits = 4 bytes = 2 words = 1 double word
64 bits = 8 bytes = 4 words = 1 quad word
Further complications present themselves with certain opcodes depending on specific registers. This will be explored in more detail in the Multiplication and Division section.
The most basic operations are assigning a value to a register or moving a value between two registers. In x86–64 this is called a move or mov. This terminology is misleading, as nothing is moved; it is merely copied or stored.
mov R0, 15 ; Store the value 15 in R0 mov R1, R0 ; Copy the value in R0 to R1 mov R3, 18446744073709551615 ; Store the largest possible 64-bit number in R3
We can add specific registers together:
mov R0, 11 ; Store the value 11 in R0 mov R1, 500 ; Store the value 500 in R1 add R0, R1 ; Add the value in R1 to R0
We can also add a value to a register:
mov R0, 25 ; Store the value 25 in R0 add R0, 12 ; Add 12 to R0; R0 now contains 37
We can subtract the value of one register from another:
mov R15, 1337 ; Store the value 1337 in R15 mov R12, 55 ; Store the value 55 in R12 sub R15, R12 ; Subtract the value in R12 from R15
We can also subtract a value from a register:
mov R1, 123 ; Store the value 123 in R0 sub R1, 24 ; Subtract 24 from R1; R1 now contains 99
In this section we will be using the mul and div opcodes. These operations are more complicated and highlight the unique purposes of several registers.
mov R0, 50 ; Store the value 50 in R0 mov R1, 12 ; Store the value 12 in R1 mul R1 ; Multiply R0 by R1. In this case R0 will be set to 600
The initial number must be stored in R0. R0 can be multiplied by a value in any of the other registers. The result will be stored in R2:R0.
mov R0, 800 ; Store the value 800 in R0 mov R2, 0 ; Clear R2 to 0 mov R3, 100 ; Store the value 100 in R3 div R3 ; Divide R2:R0 by R3. In this case R2 will be set to 0, and R0 will be set to 8
Registers R2:R0 must hold the dividend, while any other register can hold the divisor. After the div opcode executes, the quotient is stored in R0 and the remainder in R2.
Branching allows us to redirect the program flow based on certain conditions. These conditions can be checked using comparisons.
Comparisons allow us to compare the content of two registers and the system flags will be set depending on the result of the comparison. We can then change the code execution based on these system flags.
Let’s try something like a simple C ‘for’ loop.
mov R0, 0 ; Set R0 to 0 increment_loop: add R0, 1 ; Add 1 to R0 cmp R0, 10 ; Compare the value in R0 to 10 jne increment_loop ; If they are not equal then jump to increment_loop
The above code will loop 10 times. jne refers to ‘Jump if Not Equal’. This means the execution will jump back to ‘increment_loop’ if R0 does not contain the value 10. There are many other jump commands:
Another kind of branch is a function call. A function call allows us to jump to a specific section of code that will return us to where we left off when the fuction call is completed.
mov R0, 14 ; Set R0 to 14 mov R1, 23 ; Set R1 to 23 call add_and_subtract_one ; Call the function cmp R0, 5 je test_function_sucess ; If R0 == 5 then jump, if not then continue to next line ... add_and_subtract_one: ; Function to add R1 to R0 and then subtract 1 add R0, R1 sub R0, 1 ret
The registers can be used to read from and write to system memory. The mov opcode is used in a similar manner as we have seen earlier. Instead of providing a literal value we can use a memory address that is encapsulated in [square brackets].
mov R0, [0x200000] ; Copy a 64-bit value from memory address 0x200000 to R0 mov [0x402000], R3 ; Copy a 64-bit value from R3 to memory address 0x402000
The stack is an area of memory used for storing temporary information. A stack is a last in, first out (LIFO) data structure. The push operation adds to the top of the list and the pop operation removes an item from the top of the list. If you were to push the numbers 5, 7, and 15 onto the stack, you would pop them out as 15 first, then 7, and lastly 5. In assembly, you can push registers onto the stack and pop them out later - this ability is useful when you want to save the value of a register while utilizing that register for another purpose.
mov R0, 25 ; Store the value 25 in R0 push R0 ; Push the value in R0 to the stack mov R0, 12 ; Store the value 12 in R0 pop R0 ; Pop the first value in the stack to R0. In this case R0 is set to 25 again.
There is no requirement to push and pop to/from the same register. For instance, both of these segments have the same result:
mov R1, R0 ; Copy the value in R0 to R1 push R0 ; Push the value in R0 to the stack pop R1 ; Pop the first value in the stack to R1
This document only srapes the surface of the opcodes and functionality that is available with the x86-64 architecture.