OS basics and exploitsFrom lecture notes on low-level programming |
Date | ||
---|---|---|---|
Author | @asyncze |
An operating system (or simply OS) provide services to programs. The services are used via software abstractions to make it easier to manipulate low-level devices and protect the hardware. A shell is the command-line interface to the OS (some can also be accessed via library functions).
A shell program is basically an interpreter for shell commands:
on open, the shell read each character into some register as it is typed and stores the command in memory (to be executed)
load executable files, such as ./hello
, which is the command to execute program hello
, basically copies the code and data in the file from disk to memory
main
routine, which corresponds to the function main
in most programs, it then copies the data from memory to register (for faster execution), and finally from register to display device to output some resultA shell command is a command-line program, such as ls | sort
(sort listed files) and cp
(copy files). In C, system()
is a library function to execute shell commands, e.g. system("ls | sort")
.
A system call, or syscall
, not to be confused with the system()
library function, is a function composed of shell commands that can be by called by programs to request services from the operating system, such as accessing disk drives or creating processes. In the C programming language, system calls such as write
is often wrapped into other functions, such as printf
(using the wrapped versions are usually better).
A system is vulnerable to environment attacks when programs call external commands to carry out tasks, such as via system()
library calls, popen()
to open a process, or execlp()
and execvp()
to use the PATH
environment variable to locate executables. The PATH
substitution attack use shell commands without a complete path. The attacker modifies the PATH
variable to run a script or HOME
variable to control execution of commands (such as when accessing files).
A system is vulnerable to input argument attack when programs are supplied arguments via some input (command-line or otherwise). The user-provided input can be used to inject commands such as the dreaded ./program "; rm -rf /"
(call program
and then forcefully delete sub-directories starting at current pwd
).
It is also possible to traverse directories, such as the ..
-attack, overflow buffers, and perform format string attacks via bad inputs. To mitigate and avoid, always check size when copying inputs into buffers (use library functions such as snprintf
to limit size to n
) and always sanitize user-provided inputs.
A process is an abstraction for processor, memory, or I/O devices, and represent a running program. Its context is the state information the process needs to run. The processor switches between multiple running programs such as a shell and some other program using context switching, which saves the state of current process and restores the state of the other process. Threads are multiple execution units within a process with access to the same code and global data.
A kernel is a collection of code and data structures that is always in memory. It is used to manage all processes and accessed via system calls, which transfers control to the kernel temporarily to perform some action, such as read or write to file.
A virtual memory is an abstraction for memory and local disks. It provides each program with its own virtual address space so that program code and data begins at same fixed address for all processes (at bottom of the virtual address space).
Heap is used for dynamic memory allocation and expands/ contracts its size at run-time using library functions such as malloc
and free
(used for larger and more flexible memory management). Any shared libraries, such as the C standard library, is typically near the middle of the virtual address space.
Stack is at top of the user-section of the virtual address space and used to store local variables within function calls. It expands and contracts its size dynamically at run-time so that stack grows with each function call and contracts on return.
The kernel virtual memory is at top of the virtual memory space and reserved for kernel functions (programs must call kernel functions to read or write in this space).
An address space is the range of addresses available to some process. Here's an example program written in the C programming language with locations in address space (note that top-most is lower addresses and bottom is higher addresses).
int z; /* .bss */
int w = 10; /* .data */
int main() {
int x; /* stack */
int y = 42; /* stack */
char *p; /* stack */
p = malloc(42); /* pointer in stack and allocation in heap */
*p = 42; /* stack */
}
Description | Address space |
---|---|
user space | |
.text , program code |
|
.bss , uninitialized global data |
z |
.data , initialized global data |
w |
heap | malloc(42) |
memory mapped region for large chunks of memory, such as shared libraries (text, data, printf ) |
... |
stack | x |
stack | y |
stack | *p |
stack | p (points to address in heap) |
stack, growing from higher to lower addresses | *p = 42 (write 42 in address in heap) |
kernel space |
A buffer overflow can occur when programs attempt to store more elements in a buffer (set of memory locations) than has been allocated. Note that some programming languages have overflow detection (C and C++ do not).
Here's an example program written in the C programming language that is vulnerable to a buffer overflow attack (running this program could result in overflow of buffer_1
into buffer_2
).
int main() {
int i;
char buffer_2[4];
char buffer_1[6];
for(i=0; i < 10; i++) {
buffer_1[i] = 'A';
}
/* buffer_2[4] -> AAAA */
/* buffer_1[6] -> AAAAAA */
return 0;
}
A file is an abstraction for some I/O device. It provides an uniform view of devices and most input/ output in a system is just reading/ writing to files.
In UNIX-based systems, each process is assigned a real UID/GID (user ID/ group ID), which is the user initiating the process (program owner), an effective UID/GID (or EUID), which is used to determine permissions, and saved UID/GID (or SUID), which is used to drop and gain privileges.
A program with the SUID bit set has the effective UID/GID changed to that of the program owner. File permissions are set using shell commands:
- | rwx | r-x | r-x root root file
first -
is file type, where -
is regular file, d
is directory, and l
is symbolic link
rwx
set read-write-execute permissions for first root (file owner)
first r-x
set read-execute permission for second root (group owner)
last r-x
set read-execute permission for all others
The kernel will check EUID when the user is trying to write to file (root is the most privileged user), and changing EUID with chmod 4755 <file>
, which replaces rwx
with rws
(set-UID), makes the file run with the privileges of the file owner instead of the user (potential security flaw).
A system is vulnerable to file access attacks when programs create or use files on the system. Always check that the file exist and is not a symbolic link.