Practical ARM Assembly Notes

Processor Architecture Fundamentals

At its most basic, a digital processor can be thought of as a group of circuits that perform basic operations and communicate the results of these operations with each other. These operations usually include basic arithmetic, logical comparisons, data handling, etc. All activity performed by the processor is an aggregate of these basic operations.

In order to carry out these functions, the processor's circuits represent numbers in binary format using a series components that are individually capable of holding an electrical charge. The lack of a charge is used to represent a 0, and the presence of a charge is used to represent a 1. This makes a working understanding of binary essential to understanding the manner in which digital processors perform their operations.

Understanding Binary

Binary is a positional number system. The number of distinct digits a positional number system provides for counting is referred to as its base. For the most common number system, base 10, these are the digits 0 - 9. Binary is a base 2 number system, meaning it only provides two digits to count with: 0 and 1.

In any positional number system, when all of the provided digits have been used, counting continues by moving to an adjacent space and adding a new place, with the value each place representing an increase by a power of the number system's base. In base 10, this means that, after the number 9 is reached, a 1 is added in a column to the left which is referred to as the "10's place." This process is then continued with 99, where the next column is the "100's place" and so-on. Because binary is a base 2 number system, the available digits end with 1, at which point the column to the left is added as the "2's place." The additional places then increase by a power of two in value, continuing to infinity:

The number 83 represented in binary:

Place	128	64	32	16	8	4	2	1
Digit	0	1	0	1	0	0	1	1

Infinity here is a theoretical, of course. In practice, a processor can only operate on as many of these binary digits, or bits, as it's circuitry can handle at one time. This is referred to as the processor's bit-ness (i.e. 32-bit, 64-bit, etc.)

About ARM Assembly

The basic operations a processor can perform can be invoked directly via instructions: Strings of bits that trigger specific functions when passed to the processor. These instructions typically consist of an opcode, which tells the processor which operation to perform along with any available flags and required operands in bit form. For example, the binary string below instructs certain ARM processors to add two values and store the result:

10010001000000000000010000000001

Assembly languages are low-level programming languages that provide a human-readable way to directly invoke these instructions. This is accomplished with mnemonics: Short keywords that represent processor opcodes in an easy to remember form. Due to the tight correlation between these mnemonics and the instructions they represent, assembly languages are architecture specific. The set of instructions a processor can understand and the the manner in which they are carried out are part of it's ISA (Instruction Set Architecture). These notes are primarily concerned with the 32-bit A32 ISA and the 64-bit A64 ISA for ARM processors.

What's in a Name?

As stated above, these notes concern the A32 and A64 ISAs. The ARM naming conventions, however, are far from straightforward. Originally, there were not A32 or A64 ISAs: Prior to ARMv8, ARM was a 32-bit architecture capable of 32-bit and 16-bit operations. When 64-bit support was introduced with ARMv8 the A64 ISA was introduced along with it, and the old ISA was subsequently renamed A32.

It's important to note that the two ISAs are entirely separate from each other: On post ARMv8 processors A32 instructions are executed via a special, 32-bit execution state: Aarch32. Similarly, A64 instructions are executed via the Aarch64 execution state. The two execution states are mutually exclusive, however: A processor in one execution state cannot execute instructions from the ISA associated with the other execution state.

About These Notes

These notes were originally based around a single YouTube tutorial series that was chosen for its clarity and brevity. The original tutorial series, however, only covered the 32-bit ARMv7 architecture. In order to more fully explore ARM assembly, additional research was done which resulted in the inclusion of examples for the A64 ISA assembly, among other deviations for the original source.

Due to the significant differences between the A32 and A64 ISAs mentioned above, however, source code for one is often incompatible with the other both syntactically and conceptually. While newer, post-ARMv8, processors are able to execute code written for the A32 ISA, they are only able to do so via a special execution state, which also prevents them from simultaneously executing A64 instructions. To work around this difficulty, sections have been divided into both A32 and A64 content where needed. Clicking the A32 or A64 button at the top of a section will cause the page to display content relevant to the associated ISA. Irrelevant content will be hidden. The A32 content is displayed by default:

Additionally, while many hours have been spent attempting to ensure that the information in these notes is accurate, they are in no way complete. The information here attempts to simplify ARM assembly, and as a result information is often excluded, either for brevity or parity between the way the two ISAs are presented.

Regarding Assemblers

It is important to note that that much of a language's behavior, including certain features and syntactical quirks, are dictated by the assembler. For this reason, the sources used for the information in these notes have been prioritized as follows:

The behavior observed from the examples in the related repository when used with the toolchains mentioned below (i.e. If the provided Docker container is used to run the examples, they should behave as outlined in these notes).
The official ARM Developer Documentation.
The YouTube series mentioned above.
Other miscellaneous sources.

For more information on the sources used, refer to the resources page.

Getting Started

The assembly source code from the various examples in these notes and the associated repository can be viewed and modified with any simple text editor. In order to assemble and link the the source code into executable files, however, appropriate toolchains will need to be installed. To run the resulting executable files, an aarch64 capable system is also required. More information on the recommended system and toolchains can be found below. Multiple options are provided in order to support a variety of development environments:

Docker

For simplicity, a Docker image has been created that already has the required software installed. The image is based on the latest version of Ubuntu and includes the toolchains required for assembly and execution along with their dependencies. For convenience, the Nano text editor is also included. Reference the docker file located here for more information.

The container is multi-platform and leverages QEMU for cross-architecture support. It can be run on any 64-bit system with the following command, adding any bind mounts or other volumes as desired:

$ docker run -it --name practical-arm-assembly joshreeves/practical-arm-assembly sh

To exit and stop the container once it's running, simply type the "exit" command:

$ exit

The following command can be used to start the container again and re-enter its terminal:

$ docker start -i practical-arm-assembly

Baremetal

The original examples were written and assembled on a Raspberry Pi 5 running Ubuntu Server 24.04.1. For simplicity, a similar system is recommended:

A 64-Bit ARM CPU
A Debian-based Linux distribution
The APT package manager

The following commands can be used to install the toolchains required to assemble and link the source code:

$ sudo apt-get install gcc-arm-linux-gnueabi -y # Toolchain required for 32-bit examples.

$ sudo apt-get install gcc-aarch64-linux-gnu -y # Toolchain required for 64-bit examples.

QEMU

If neither of the above options can be used, QEMU can be used to emulate the appropriate architecture. QEMU is a full system emulator that allows systems of one architecture to simulate a system of another architecture. For more information visit the QEMU site.

CPULator

Alternatively, the CPULator site can be used to execute most of the code from the 32-bit examples. This has the benefit of displaying the values held by various components of the processor as the program executes, but some commands cannot be accurately emulated (i.e. The system interrupt command). Only the 32-bit examples can be executed this way. The 64-bit examples are not supported.

More detailed instructions for all of the methods described above can be found on the Resources page.

Section 01: MOV and SVC