OSD: Multitasking

Types of multitasking

Cooperative multitasking. When a task or thread has nothing to do, it yields by calling the scheduler function 'voluntarily'. Examples: Win16, old MacOS, simple kernels for embedded systems.

Preemptive multitasking. Task-switching is driven mainly by asynchronous interrupts from keyboard, timer, disk, serial port, etc. Examples: server OSes, most modern desktop OSes, realtime kernels.

TSS-based task-switching

xxx - reasons not to use it:

works only with x86 CPUs
TSS mechanism is not being improved as the x86 CPU evolves, so it gets slower compared to other task-switching methods in advanced x86 CPUs
less flexible that stack-swapping in terms of which registers get reloaded (especially CR3; the page table register)

Using `setjmp()` and `longjmp()` for task-switching

setjmp() stores the CPU state (i.e. the registers, including ESP) in a jmp_buf data structure. longjmp() loads the CPU registers from a jmp_buf.

In general, the jmp_buf struct does not contain the entire CPU state. glibc2, for example, saves only ESP, EIP, and the callee-save registers. This is enough state, however, that setjmp() and longjmp() can be used for coroutines (cooperatively-scheduled threads). See the code snippet below for an example.

It's possible to use these functions for preemptive multitasking if

EFLAGS is made part of the jmp_buf. This is needed to maintain the state of the interrupt enable bit for each thread, and it's a prerequisite for:
longjmp() must end with IRET. For an explanation of why this is necessary, see below. The use of IRET implies that:
The code to be run is ring 0 (kernel privilege) code
The other general-purpose registers (EAX, ECX, EDX) must be saved in the jmp_buf. It may or may not be necessary to save the remaining CPU registers. There is no need to save the floating-point registers if floating-point is not used, nor any need to save the segment registers if they are not modified.

None of these conditions is guaranteed by the C library specification. If you want to use setjmp() and longjmp() for preemptive multitasking, you should write your own versions of these functions.

Stack-swapping

Task-switching by swapping stacks stores the CPU registers on the stack, instead of in a data structure. There are two stack-swaps to consider: the switch between user stack and kernel stack when the kernel is entered or exited, and switching between different kernel stacks.

User-kernel stack-swapping (only for OSes with protection):

User-to-kernel stack-swap is done automatically by x86 interrupt mechanism: kernel SS and ESP are loaded from the current TSS, then user SS, user ESP, EFLAGS, CS, and EIP are pushed.
Kernel-to-user stack-swap is partially automatic: current kernel SS and ESP must be stored in TSS before IRET to user mode.
Kernel entry is caused by an interrupt, trap, or exception; kernel exit is done by IRET. Interrupts are asynchronous (preemptive), so all registers must be saved.

The use of IRET and the need to save all registers constrains the layout of registers on the stack. One possible layout:

    typedef struct {
        unsigned edi, esi, ebp, esp, ebx, edx, ecx, eax;/* pushed by common handler (PUSHA) */
        unsigned ds, es, fs, gs;                        /* pushed by common handler */
        unsigned which_int;                             /* pushed by stub */
        unsigned error_code;                            /* pushed by interrupt or stub */
        unsigned eip, cs, eflags, user_esp, user_ss;    /* pushed by interrupt */
    } uregs_t;

Swapping of kernel stacks:

EIP must be saved.
In a re-entrant kernel, the stack-swap function may be entered with interrupts enabled or disabled. Interrupts must be disabled while ESP is being altered, however, so EFLAGS must be saved.
For a kernel written in C, the callee-save registers must be saved.
As with kernel exit, the routine that swaps kernel stacks must also end with IRET

Interrupts may be enabled upon leaving the kernel stack-swap routine. Enabling interrupts (loading the EFLAGS register) must be simultaneous with jumping to the new thread (loading the EIP register). The only instruction that can load both registers at the same time is IRET.

Example layout of registers on the stack:

    typedef struct {
        unsigned ebx, esi, edi, ebp;    /* callee-save */
        unsigned eip, eflags;           /* CALL/IRET */
    } kregs_t;

The same register frame can be used for both types of stack-swapping, i.e. use uregs_t where you would use kregs_t. This may simplify the kernel code somewhat, but is slower.

Code snippets

Cooperative multitasking with setjmp() and longjmp()

Links

'An Introduction to Programming with Threads': http://guir.cs.berkeley.edu/projects/osprelims/papers/threads.ps.gz. Starts out with description of threads and reasons to use them, then continues with mutexes, condition variables, and other synchronization stuff.

Tru64 UNIX: Guide to the Posix Threads Library

IBM's AS/400 Posix Threads documentation (not as detailed as the Tru64 Unix doc)

TO DO

- scheduling algorithms: run-until-completion (FIFO/batch), simple round
  robin, round robin with priorities, STCF (optimal?), priority queues
- fork(): duplicate current task (new address space; same executable)
- exec(): reload existing address space (same address space; new executable)
- run()/spawn(): create and run new task (new address space; new executable)