Computers and Compilation

From lecture notes on low-level programming
Date
Author @asyncze

Contents

What is a program? #

A computer program is a sequence of bits, e.g. organised as 8-bit bytes (8-bits per byte), and each byte represent some text character (ASCII standard). Most programs are written in high-level programming languages, then compiled/ translated into object files, which are executed by a process running on the processor and finally terminated.

An object file contains program code and required libraries, the program code is in binary format, there's also relocation information, which are addresses that need to be fixed once loaded into memory, such as symbols defined by object and debugging information.

Computer systems #

A modern computer is a system of buses, I/O devices, memory, and processor.

Buses are collections of electrical conduits that carry bytes between components. They're usually fixed sized and referred to as words, where word size is 4 bytes (32 bits) or 8 bytes (64 bits).

I/O devices are the connections to the external world, i.e. keyboards, mice, screens, and disk drives (HDD/SDD). Disk drives are where executable files are stored. The motherboard (main circuit board) and many other devices contain controller chips (controlling data transfers between the devices). External cards plugged into the motherboard have adapter chips (translating between different types of interfaces).

Memory is a temporary storage device that holds program code and data when executed by the processor. Physically, memory is usually a collection of dynamic random-access memory (DRAM) chips, and logically, it is a linear array of bytes with its own unique address space (array index).

Processor is the central processing unit (CPU). It is the engine that interprets and executes the machine-level instructions stored in the memory. The CPU has a word size storage device, i.e. register, called program counter (PC) that points to some instruction in memory. The instructions are executed in a strict sequence, which is to read and interpret the next instruction (as pointed to by PC), perform the operation (as per the instruction), and then update PC to point to the next instruction. Each instruction has its own unique address in memory.

CPU operations #

A CPU operation use a small storage device with a collection of word-size registers (register file) and the arithmetic logic unit (ALU) to compute new data and address values:

Cache memory #

A cache memory refers to memory devices at various accessibility levels (sizes). A smaller size memory device is faster than a larger one. These different levels of cache memory can be used to make programs run faster by always accessing the smallest one possible.

Level Type Data
L0 Register CPU register hold words copied from other cache levels
L1 SRAM L1 hold lines copied from L2
L2 SRAM L2 hold lines copied from L3
L3 SRAM L3 hold lines copied from memory (somtimes referred to as main memory in this context)
L4 DRAM Main memory hold disk blocks from local disks
L5 Local secondary storage Local disks hold files
L6 Remote secondary storage

The different levels of memory is also referred to as the memory hierarchy and is typically measured in response time. Each upper-level is a cache memory for the lower-level. The top-most levels (usually registers) are reserved for the processor. A cache line is multiple words and disk blocks are larger portions of data copied from local disk drives.

Compilation process #

The compilation process, in this case for a program written in the C programming language and using GCC, includes preprocessor, compiler, assembler, and linker. These are other programs and are used in stages to translate program code into a sequence of machine-language instructions. The machine-language instructions are then packed into an executable object file. The compilation process is often referred to as just compilation or compiler.

Preprocessor modify the program code based on directives that begin with #, such as #include <stdio.h>, which are libraries to be included in the program text. The result is an intermediate file with .i suffix (use flag -E to see this file).

Compiler translates the preprocessed intermediate file into an assembly program with .s suffix. Each line in an assembly program describe one instruction (use flag -S to see this file).

Assembler translates the compiled assembly instructions into machine-level instructions (use flag -c to compile assembly program). The result is a binary (or relocatable) object file with .o suffix.

Linker merges one or more relocatable object files, usually to include separate and already compiled .o-files that the program is using, such as printf.o. Bascially, a linker is used to resolve references to external objects such as variables and functions (e.g. printf). Static linking is performed at compile-time and dynamic linking is performed at run-time. The result is an executable file with no suffix. An executable file is ready to be loaded into memory and executed by the system.

Here's a simple example program.

/* hello.c */
#include <stdio.h>

int main() {
    printf("Hello C\n");
    return 0;
}

And here's the resulting assembly instructions. Note that operand size suffix and special directives for assembler is omitted for clarity.

.section __TEXT
    .globl  _main

_main:
    push    %rbp
    mov     %rsp, %rbp
    sub     $16, %rsp
    mov     $0, -4(%rbp)
    lea     L_.str(%rip), %rdi
    mov     $0, %al
    call    _printf
    xor     %eax, %eax
    add     $16, %rsp
    pop     %rbp
    ret

.section __TEXT
    L_.str: .asciz  "Hello C\n"

Useful software tools #