Understanding Hello World – on Raspberry Pi
Like so many other people, over the last few weeks of lockdown, I’ve been trying to use the extra time that I’m not spending commuting to work to learn some new skills that I’ve always meant to get around to learning. In particular, I been teaching myself ARM Assembly Language programming.
I’ve always been more interested in the low-level side of computer programming – than I have been in applications. I’ve always (even as a child) wanted to know how the magic box that is the computer actually works. I’m not new to assembly programming: I’ve done all kind of odds and ends of assembly over the years: from (emulated) 6502 on the KIM-Uno, to the real thing on a BBC model B; as well as more recently taking a course looking at the (also emulated) Atari 2600, and following along with Ben Eater’s fabulous breadboard 6502 series. The observant amongst you will note a common-thread there – the iconic 6502 processor (yes; I was an ‘80s kid). In addition to this, I have also done some assembly programming for PIC microcontrollers, and (rather more years ago than I’d care to admit!) back when I was doing my A-Levels in college I learnt a little x86 too; but I’ve never done anything much with ARM CPUs. Given my professional interests in the Internet of Things & cybersecurity, I thought that learning ARM assembler would be fun (and potentially useful in the future too, perhaps).
Interacting with the Operating System
Unlike some of the older computer systems that I mentioned earlier, the vast majority of microprocessors (and even some microcontrollers) today run some sort of operating system. The point of an operating system is easily missed today because of their ubiquity; but they essentially exist provide services to user programs. Without an OS, every application would have to include it’s own code to drive all of the peripheral devices: vital functionality such as reading a keyboard, displaying things on a screen, and providing an abstraction of storage devices to enable users to work with the concept of files.
In 2020 there are basically two types of OS in regular use. There are the POSIX (Portable Operating System Interface) standard compliant OS (as implemented in Linux, MacOS, BSD: and to some extent in iOS and Android too); and Windows. Given that I am not a Windows user (apart from for Office stuff at work) – the choice here was easy. So for my learning environment, I pulled out a Raspberry Pi (running Raspbian Linux): with its ARM Cortex-A72 processor.
I’ve heard it said that in order to learn to program in C, you need to understand everything – before you can do anything; I’m not sure that’s really true for C, but I’d argue that it’s undeniably for assembly language. When learning to program in a new language, the canonical introductory program is is Hello World; but in many assembler books it’s often one of the very last things you’ll see. There are some good reasons for that: not least of which is that because there’s almost no abstraction from the underlying OS when it comes to printing a message on the screen. Since we don’t have a handy print()
function: we have to set about directly request that the OS shows our message to the user.
Hello, World; and down the rabbit-hole…
Since ultimately it’s the OS that’s going to be showing our message (regardless of the language we’re programming in) – we can start with a slightly easier problem – and work backwards to where we want to be… so let’s start with a simple Hello World program in C.
// hello.c #include <stdio.h> int main() { printf("Hello World!\n"); return 0; }
If you’ve ever written any C code before – you’ll immediately see that this is probably the simplest (useful) program that you can write. Now, of course, when we write a program in C we have to compile it – turning it from the nicely human-readable C-code, into actual machine-code: the binary 1s and 0s that the processor itself can actually execute.
Given that there is (by definition) a one-to-one mapping between assembler mnemonics, and machine-code instructions – we can turn any executable programme into its assembly code equivalent by simply disassembling it. So let’s have a go at that…
If we build the code with gcc hello.c -o hello
, we get an executable which does what we’d expect; and we can disassemble it using objdump -d hello
.
If we look at that, we’ll see that there is a lot of assembly code: far more than we perhaps might expect for a simple program – and most of which (unhelpfully for us) isn’t really anything to do with our Hello World.
Regardless of the processor architecture – there are a few things that any CPU has in common. There will be one or more (usually more!) registers (tiny memory locations inside the CPU itself) which can be used to store data and perform operations on it; there’ll be some instructions to load data into those registers (either directly from the code, other registers, or from memory), and to store the data back into memory; and some control instructions to allow for things like branches and loops.
With this in mind, we can find the section corresponding to our main()
function (<main>
in the disassembly): we’ll see that (amongst other things) it looks like it calls (branches to – using the bl
instruction) something called puts
(which might ring a bell if we’ve ever used the C library function puts()
).
00010404 <main>: 10404: e92d4800 push {fp, lr} 10408: e28db004 add fp, sp, #4 1040c: e59f000c ldr r0, [pc, #12] ; 10420 <main+0x1c> 10410: ebffffb3 bl 102e4 <puts@plt> 10414: e3a03000 mov r3, #0 10418: e1a00003 mov r0, r3 1041c: e8bd8800 pop {fp, pc} 10420: 00010494 .word 0x00010494
Since we’re expecting the code to ask the OS to print the message for us, let’s now use the Linux strace
command to see the system call that our code ends up making: using strace ./hello
.
Again there’s lots of output which isn’t really relevant ; but right at the bottom we’ll see something that finally looks like it might be starting to be useful…
write(1, "Hello World!\n", 13Hello World!) = 13
exit_group(0)
(As an aside, if we build our program with the -static
compiler flag – e.g. gcc hello.c -static -o hello
– then we’ll have a much shorter output, as the program has fewer external library calls to make. And we’ll come back to this in a moment).
So our original printf()
function call has eventually been mapped down to a write()
system call (with the printing happening as its called – in the middle of the output: hence the slightly messy output).
write(1, "Hello World!\n",13)=13
Let’s have a look at the man page for the write()
call.
I should note here, that somewhat unhelpfully for us here, there’s also a write
command – which gets presented by default – so to find what we’re actually looking for we need to use: man 2 write
.
The man page for the write
command, is in section 1; whereas the one we want is in section 2.
Once we have the right page, (in the Linux Programmer’s Manual) we see the write()
function – which is C’s wrapper for the underpinning system call.
NAME
write - write to a file descriptor
SYNOPSIS
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
DESCRIPTION
write() writes up to count bytes from the buffer
starting at buf to the file referred to by the file
descriptor fd.
...
Now we have this knowledge; we can drop a bit further down the metaphorical stack – and try calling this directly function directly.
This function is from unistd.h
, and that’s properly into POSIX-only territory. The unistd.h
header is at the heart of the POSIX interface – essentially defining the C-wrapper to the underpinning OS’s built-in routines to actually do stuff.
So here’s our new C code example:
// hello2.c #include <unistd.h> int main() { write(1, "Hello World!\n",13); return 0; }
Note that we have to pass a couple of additional parameters to the write()
function compared with printf()
: the file descriptor for the file we want to write to; and the length of the string we want to write. In Linux pretty much everything is a file: including the display or stdout
as it’s known; and stdout
has the file descriptor of 1.
If we build and run this – we see that we get the same output as before… So far, so good. So let’s have another look with objdump
: objdump -d ./hello2
This time the resulting output is smaller – but there’s still plenty of it.
Our <main>
section this time contains a branch to something else – write@ptr
.
00010404 <main>: 10404: e92d4800 push {fp, lr} 10408: e28db004 add fp, sp, #4 1040c: e3a0200d mov r2, #13 10410: e59f1010 ldr r1, [pc, #16] ; 10428 <main+0x24> 10414: e3a00001 mov r0, #1 10418: ebffffb7 bl 102fc <write@plt> 1041c: e3a03000 mov r3, #0 10420: e1a00003 mov r0, r3 10424: e8bd8800 pop {fp, pc} 10428: 0001049c .word 0x0001049c
Similarly; if we use strace
we see another call to that write()
function.
So let’s build a statically linked version of the program, so that we can see more of what’s going on. As I mentioned previously, this will result in a version of the program being built which contains it’s own copy of all of the library code it needs – this will be much larger than the default, dynamically linked, version – but it should show us more of what’s going on. To do this we use: gcc hello2.c -static -o hello2
.
As you can see the new version is nearly 62x larger – which is one reason why static linking isn’t the default!
If we run it, it still gives the same output as before – but if we look at it in objdump
we see that we are nearing our destination – as we’re now seeing the write
function within the core C library itself.
000104cc <main>: 104cc: e92d4800 push {fp, lr} 104d0: e28db004 add fp, sp, #4 104d4: e3a0200d mov r2, #13 104d8: e59f1010 ldr r1, [pc, #16] ; 104f0 <main+0x24> 104dc: e3a00001 mov r0, #1 104e0: eb005b84 bl 272f8 <__libc_write> 104e4: e3a03000 mov r3, #0 104e8: e1a00003 mov r0, r3 104ec: e8bd8800 pop {fp, pc} 104f0: 0005e6d8 .word 0x0005e6d8
If we really wanted to keep going, we could dig into the disassembly of libc (which on this Raspberry Pi lives in /lib/arm-linux-gnueabihf/libc.so.6
). If we do that we eventually see <__write@@GLIBC_2.4>
defined, which contains the line of assembler that shows we’re at the bottom of our rabbit hole. We see the assembly instructionsvc 0x0000...
; this is the supervisor call where we actually hand back control from our program to the OS to do the printing.
Linux System Calls
The POSIX standard defines the system calls (or syscalls) that we can use. The latest version of that can be found here: https://pubs.opengroup.org/onlinepubs/9699919799/; but it’s not exactly easy reading; we can see a far easier list of the available system calls for Linux here: https://syscalls.kernelgrok.com/.
Right near to the top of the list we see #4 – sys_write
; the parameters for which correspond to our write()
function, as we’d expect – give that write()
is just a wrapper for the syscall).
So let’s now go back to C and skip the middle-man, and call the system directly. C provides us with a syscall()
function, to do this – so let’s take a look at this: man syscall
…
SYSCALL(2) Linux Programmer's Manual SYSCALL(2)
NAME
syscall - indirect system call
SYNOPSIS
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <unistd.h>
#include <sys/syscall.h> /* For SYS_xxx definitions */
long syscall(long number, ...);
DESCRIPTION
syscall() is a small library function that invokes the
system call whose assembly language interface has the
specified number with the specified arguments.
Employing syscall() is useful, for example, when invoking
a system call that has no wrapper function in the C
library.
...
As we saw from the syscall table, there are four things we need to provide: the number of the syscall we want to use, the file descriptor of the place we’re writing to (here, the number 1 corresponding to stdout
), the string itself, and the length of that string.
//hello3.c #include <unistd.h> #include <sys/syscall.h> int main() { syscall(SYS_write, 1, "Hello World!\n",13); return 0; }
Compiling and running it, we see (once again) the same output.
Incidentally if you could be bothered to track down the sys/syscall.h
and then follow all of the chain of includes that it then itself includes (it’s quite a journey, so I’ll spare you the trouble), you’d eventually see that SYS_write
is eventually defined as having a value of 4 (which isn’t surprising – given that it is the syscall number shown in the table we saw earlier).
So we can (if we want to write really non-portable code) use:
//hello4.c #include <unistd.h> #include <sys/syscall.h> int main() { syscall(4, 1, "Hello World!\n",13); return 0; }
Which will again do exactly the same thing.
Turning it back into assembly
If we build using the -static
flag again, and use objdump -d
one more time, we’ll finally see what is going on, and how the information gets passed to the OS; and finally get to see the actual syscall in the disassembly.
000104cc <main>: 104cc: e92d4800 push {fp, lr} 104d0: e28db004 add fp, sp, #4 104d4: e3a0300d mov r3, #13 104d8: e59f2014 ldr r2, [pc, #20] ; 104f4 <main+0x28> 104dc: e3a01001 mov r1, #1 104e0: e3a00004 mov r0, #4 104e4: eb005ee9 bl 28090 <syscall> 104e8: e3a03000 mov r3, #0 104ec: e1a00003 mov r0, r3 104f0: e8bd8800 pop {fp, pc} 104f4: 0005e718 .word 0x0005e718 ... 00028090 <syscall>: 28090: e1a0c00d mov ip, sp 28094: e92d00f0 push {r4, r5, r6, r7} 28098: e1a07000 mov r7, r0 2809c: e1a00001 mov r0, r1 280a0: e1a01002 mov r1, r2 280a4: e1a02003 mov r2, r3 280a8: e89c0078 ldm ip, {r3, r4, r5, r6} 280ac: ef000000 svc 0x00000000 280b0: e8bd00f0 pop {r4, r5, r6, r7} 280b4: e3700a01 cmn r0, #4096 ; 0x1000 280b8: 312fff1e bxcc lr 280bc: ea000ac7 b 2abe0 <__syscall_error>
From here we have nearly everything we need to do this in assembler from scratch.
Since we’re trying to build the minimal working example, we’ll start by stripping out all of the parts that we don’t absolutely need out of the disassembly…
As I said before, all assembly language programming, regardless of the architecture, is about getting data from one place or another and storing it in one of our available registers. In ARM we have registers numbered from r0 to r15 (although some of these have special purposes). Looking at our code, we can see the move instruction mov
is used a lot – with the destination first, then the source. We can see that we’re putting the length of the message into r3, the file descriptor for stdout
into r1; and the syscall number into r0. We’re also loading (ldr
) the address of the message into r2.
000104cc <main>: mov r3, #13 ldr r2, [pc, #20] ; 104f4 <main+0x28> mov r1, #1 mov r0, #4 bl 28090 <syscall> ... 00028090 <syscall>: mov r7, r0 mov r0, r1 mov r1, r2 mov r2, r3 svc 0x00000000
We can also see that most of the <syscall>
section is just moving data from one register to another before it’s used. So, given that, we can just put the data into the right registers to start with…
mov r2, #13 ldr r1, [pc, #20] ; 104f4 <main+0x28> mov r0, #1 mov r7, #4 svc 0x00000000
Now this is more like it! We’re down to just five lines of assembler: but to actually build it (and to actually define the message we want to print!) – we need to add just a little more code around it.
@ hello_a.s .section .rodata msg: .ascii "Hello World!\n" .align 2 .global _start .text _start: mov r0, #1 @ stdout ldr r1, =msg @ location of msg mov r2, #13 @ length of msg mov r7, #4 @ write svc #0 mov r7, #1 @ exit syscall mov r0, #0 @return value svc #0
And so, now, we finish up with what is (I think) the minimum working example of a Hello World program in ARM assembler.
We start by defining our message (here labelled as msg
) in a “read-only data” section (ensuring it’s aligned to a 2-byte word), then we define our entry-point _start
, which contains our short program. I’ve re-ordered the instructions into a more sensible order too. Lastlyto finish, we make a second syscall (#1) to exit the program cleanly. This latter syscall (also known as sys_exit
, has just one parameter, which we set in r0
– and that’s the return value of the program. As with the standard way of working on POSIX systems – a 0 denotes a successful execution, and anything else represents an error condition.
And that’s it. Hello World in ARM assembler. Next time, we’ll look at some other syscalls, and have a go at writing a useful (if wildly impractical) program in assembler.
2 thoughts on “Understanding Hello World – on Raspberry Pi”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Filed under: ARM,Raspberry Pi - @ May 11, 2020 21:49
Tags: Assembler
Its interesting to see how ARM Assemblys changed over the years. Back in the RISC OS only days you’d be able to write :-
ADR R0, hello_world
SWI “OS_Write0”
.hello_world
EQUS “Hello World!”
ALIGN
👍🏻