OSD Home

OS-dependent functions in the C library

These functions must be rewritten or 'ported' to your own OS. They include nearly all of the functions declared in these files:
        stdio.h         fcntl.h         io.h/unistd.h
        signal.h        process.h       dir.h/dirent.h/direct.h
The heap memory functions malloc(), calloc(), realloc() and free(), in stdlib.h, may also need to be changed.

In a well-written library, all of these changes will be confined to a relatively small number of files where the libc-to-OS interface occurs.

OS-independent (portable) functions in the C library

Not all libc functions must be ported. Most of the functions in string.h and ctype.h can be used unchanged. If the compiler has 'built-in' string functions, they should also be safe to use.

Problematic functions in the C library

The sprintf() function in stdio.h does not call the OS, but it is still problematic. All of the ...printf()s are huge monolithic functions that depend on a lot of other modules in the C library, making them difficult to port (this is one advantage of C++ iostreams). It may be best to write your own lightweight/stripped-down printf().

Should I write libc functions in C or assembler?

Language Portable code? Fast code? Small functions can be inlined? Preprocessor?
C Yes No Yes, with GCC Yes
Inline assembler No; same CPU and compiler only Yes Yes, with GCC Same preprocessor as compiler
Non-inline assembler No; same CPU and compatible linker only Yes No No preprocessor; or different preprocessor than compiler [*]

[*] With care, the GNU C preprocessor can be used with almost any assembler. Your makefiles must be rewritten so that assembly is a two-step process, with separate filename extensions for the initial asm file and the preprocessed file. GNU software frequently uses the extension '.S' for non-preprocessed asm, and '.s' for preprocessed asm. This causes problems in DOS systems, where filenames are not case-sensitive.

Interfacing HLL code with asm

C calling convention - standard stack frame

Arguments passed to a C function are pushed onto the stack, right to left, before the function is called. The first thing the called function does is push the (E)BP register, then copy (E)SP into it. This creates a data structure called the standard C stack frame.
32-bit code 16-bit code, TINY, SMALL, or COMPACT memory models 16-bit code, MEDIUM, LARGE, or HUGE memory models
Create standard stack frame, allocate 16 bytes for local variables, save registers
push ebp
    mov ebp,esp
    sub esp,16
        push edi
        push esi
            ...
push bp
    mov bp,sp
    sub sp,16
        push di
        push si
            ...
push bp
    mov bp,sp
    sub sp,16
	push di
        push si
            ...
Restore registers, destroy stack frame, and return
            ...
        pop esi
        pop edi
    mov esp,ebp
  pop ebp
ret
            ...
        pop si
        pop di
    mov sp,bp
  pop bp
ret
            ...
        pop si
        pop di
    mov sp,bp
  pop bp
retf
Size of 'slots' in stack frame, i.e. stack width 32 bits 16 bits 16 bits
Location of stack frame 'slots' [ebp + 8]
[ebp + 12]
[ebp + 16]...
[bp + 4]
[bp + 6]
[bp + 8]...
[bp + 6]
[bp + 8]
[bp + 10]...

If an argument passed to a function is wider than the stack, it will occupy more than one 'slot' in the stack frame. A 64-bit value passed to a function (long long or double) will occupy 2 stack slots in 32-bit code or 4 stack slots in 16-bit code.

Function arguments are accessed with positive offsets from the BP or EBP registers. Local variables are accessed with negative offsets. The previous value of BP or EBP is stored at [bp + 0] or [ebp + 0]. The return address (IP or EIP) is stored at [bp + 2] or [ebp + 4].

C calling convention - return values

A C function usually stores its return value in one or more registers.
32-bit code 16-bit code, all memory models
8-bit return value AL AL
16-bit return value AX AX
32-bit return value EAX DX:AX
64-bit return value EDX:EAX space for the return value is allocated on the stack of the calling function, and a 'hidden' pointer to this space is passed to the called function
128-bit return value hidden pointer hidden pointer

C calling convention - saving registers

GCC expects functions to preserve the callee-save registers:
        EBX, EDI, ESI, EBP, DS, ES, SS
You need not save these registers:
        EAX, ECX, EDX, FS, GS, EFLAGS, floating point registers
In some OSes, FS or GS may be used as a pointer to thread local storage (TLS), and must be saved if you modify it.

C calling convention - leading underscores

Some C compilers (those for DOS and Windows, and those with COFF output) prepend an underscore to the names of C functions and global variables. If a C global variable, e.g. conv_mem_size, is accessed by asm code, it should be declared with a leading underscore in the asm code:
EXTERN _conv_mem_size      ; NASM syntax
        mov [_conv_mem_size],ax
Linux ELF does NOT use underscores. Watcom C uses trailing underscores for function names, and leading underscores for global variables.

If your GCC supports it, leading underscores can be turned off with the compiler option -fno-leading-underscore

Pascal calling conventions

Function arguments are pushed onto the stack from left to right before the function is called. C-style variable-length argument lists are not possible in Pascal. (Look in file STDARG.H and think about it.)

In C, the calling function must 'clean up the stack' (remove function arguments from the stack after the called function returns). In Pascal, the called function must do this, before returning.

Pascal identifiers are case-insensitive. MyKewlProc() will be stored in the object code file as MYKEWLPROC

Other calling conventions

The __stdcall calling convention, used by Windows, is a hybrid of the C and Pascal calling conventions. Like C, function arguments are pushed right-to-left. Like Pascal, the called function must clean up the stack. Exception: the caller must clean up the stack for functions that accept a variable number of arguments, e.g. printf(const char *format, ...);

Watcom C uses a register-based calling convention. See sections 7.4, 7.5, 10.4, and 10.5 in cuserguide.pdf in the Watcom documentation. Individual functions can be declared to use the normal, stack-based calling convention.

GCC can be made to use a register calling convention by compiling with gcc -mregparm=NNN ...
See the GCC documentation for details.

Examples

32-bit asm code called by C

; C prototype ('extern' and parameter names 'arg1' and 'arg2' are optional):
; extern unsigned long long shr64(unsigned long long arg1, int arg2);

BITS 32
SECTION .text

GLOBAL _shr64                   ; omit the underscores for Linux ELF
_shr64: push ebp
	    mov ebp,esp
	    ; push ecx          ; ECX is 'caller-save' for GCC
	    mov ecx,[ebp + 16]  ; ECX=arg2, at slot #3
	    mov eax,[ebp + 8]   ; EDX:EAX=arg1, at slot #1...
	    mov edx,[ebp + 12]  ; ...and slot #2
again:      shr edx,1
	    rcr eax,1           ; EDX:EAX >>= CL
	    loop again
	    ; pop ecx
	pop ebp
	ret ; 64-bit return value in EDX:EAX

16-bit asm code called by C

; C prototype:
; extern unsigned long shr32(unsigned long arg1, int arg2);

SEGMENT _TEXT PUBLIC CLASS=CODE

GLOBAL _shr32
_shr32: push bp
	    mov bp,sp
	    push cx
		mov cx,[bp + 8] ; CX=arg2, at slot #3
		mov ax,[bp + 4] ; DX:AX=arg1, at slot #1...
		mov dx,[bp + 6] ; ...and slot #2
again:          shr dx,1        ; DX:AX >>= CL
		rcr ax,1
		loop again
	    pop cx
	pop bp
	ret ; 32-bit return value in DX:AX

Libc code snippets

Assembly-language macros EXP and IMP, to interface asm code to C with or without DOS/Windows/COFF underscores:
asNASM
	.ifdef UNDERBARS

	.macro EXP sym
		.global \sym
		\sym:
		.global _\sym
		_\sym:
	.endm
	.macro IMP sym
		.extern _\sym
		.equ \sym,_\sym
	.endm

	.else

	.macro EXP sym
		.global \sym
		\sym:
	.endm
	.macro IMP sym
		.extern \sym
	.endm

	.endif

	%ifdef UNDERBARS

	%macro EXP 1
                GLOBAL _$%1
                _$%1:
		GLOBAL $%1
		$%1:
	%endmacro
	%macro IMP 1
                EXTERN _$%1
                %define %1 _$%1
	%endmacro

	%else

	%macro EXP 1
		GLOBAL $%1
		$%1:
	%endmacro
	%macro IMP 1
		EXTERN $%1
	%endmacro

	%endif
If the assembly-language code is meant for use with DOS, Windows, or a compiler that produces COFF output, use the leading underscores:
	nasm -dUNDERBARS=1 ...
	as --defsym UNDERBARS=1 ...
ELF systems (e.g. Linux) do not require leading underscores.

Links

Some open source C libraries: 'String functions that meet the needs of i[45]86', by Ulrich Drepper. Read this if you want to write your own string.h functions.

A good C (and C++) standard reference is at: http://www.dinkumware.com/htm_cl/index.html

The Better String library for C (bstrlib): http://bstring.sf.net/

TO DO

- The unit of linkage is the module. For C, module == file. Put each
  function into its own file to prevent bloat (linking of unrelated
  and unnecessary functions).

REPORT BUGS OR ERRORS IN THIS DOCUMENT