OSD Home
Bugs in software development tools
- NASM version 0.98 makes broken ELF files if you add user-defined
sections to the ELF file (anything other than .text, .data, or .bss).
This bug is fixed in newer versions such as NASM 0.98.08
- MinGW32 based on GCC 2.95.2
stores the BSS size incorrectly in the BSS section header.
Because of this bug, this compiler will not interoperate with NASM or
Microsoft compilers.
- The CygWin and MinGW32 linkers crash when asked to make a binary
kernel. You can get around this to some extent by linking to PE
format with identical memory alignment and file alignment:
ld --oformat pei-i386 --file-alignment 0x1000 --section-alignment 0x1000 ...
Now you have a 'binary' kernel with a 4096-byte PE header at the beginning,
which the bootloader can skip over (thanks to Tim Robinson for this idea).
Note that this may still fail if you have user-defined sections that
come after the BSS.
- With a linker script like this:
.bss: {
*(.bss)
*(.common)
end = . ; _end = . ;
}
some buggy versions of ld may put 'end' between common and bss.
Use this instead:
...
*(.common)
}
end = .; _end = . ;
(thanks to Jarek Pelczar for finding this bug)
- The ELF binutils for DJGPP are somewhat buggy. Using objcopy
to convert between ELF and COFF will fail with relocatable (.o) files,
and with executable files that contain user-defined sections (anything
other than .text, .data, and .bss).
- ld version 2.6 doesn't set the LMAs properly in ELF files,
even if you use AT() in a linker script. This bug may cause GRUB
to choke on your ELF kernel.
Bootloader 'foo' doesn't work with kernel 'bar'
According to the Multiboot standard (used by the GRUB bootloader),
a pmode kernel must not rely on the GDT layout, segment descriptors,
or selectors defined by the bootloader. This turns out to be good
advice for other bootloaders as well. The initial code of your
kernel (pmode or real mode) should be written in a very
'defensive' manner that makes few or no assumptions about the
bootloader. This will probably require the use of
assembly language and relative addressing.
DOS 7+ loads HIMEM.SYS silently and automatically, then loads itself
high (into the HMA). An easy way to deal with this is to avoid the
HMA when you load or copy the kernel into extended memory. A better
(safer) way is to use XMS to find and allocate free extended memory.
This will prevent you from stepping on any XMS 'clients' such as
SMARTDRV.
Watch out for the case where HIMEM.SYS is loaded and SMARTDRV is also
loaded in such a way that the free XMS memory block straddles a
4 meg line. If your kernel is loaded into this memory, and it uses
paging, the kernel will need two pages tables to map the kernel memory.
The kernel could be copied to 1 meg after it's been loaded. This will
probably trash DOS, so do it just before entering pmode.
Kernel code and data are not linked to proper addresses
- Use a linker script to get control of this (see 'More linker
gotchas', below).
- If you are linking to binary format, link to a non-binary format
instead (COFF, ELF, PE, etc.), then dump the symbols and disassemble
the kernel, THEN convert to binary:
ld -g -Tcoffkrnl.ld -o krnl.cof $(OBJS) lib/libc.a
objdump --line-numbers --source krnl.cof >krnl.lst
nm -n krnl.cof >krnl.sym
objcopy -O binary krnl.cof krnl.bin
Examine files krnl.lst and krnl.sym to see if
everything is properly located.
- For x86, the kernel data segment is more sensitive to link/locate
errors because much of the code uses EIP-relative addressing. Your
kernel startup code can test if the kernel data segment has been properly
linked, located, and loaded:
DS_MAGIC equ 3544DA2Ah
SECTION .text
BITS 32
GLOBAL entry
entry:
call where_am_i ; where did the loader put us?
where_am_i:
pop esi ; ESI=physical adr (where we are)
sub esi,where_am_i ; subtract virtual adr (where we want to be)
; now ESI=virt-to-phys
cmp dword [esi + ds_magic],DS_MAGIC
je ds_ok
mov word [0B8000h],9F44h ; display blinking white-on-blue 'D'
jmp short $ ; freeze
ds_ok:
...
SECTION .data
ds_magic:
dd DS_MAGIC
...
More linker gotchas
Entry point of binary kernel compiled from C
The entry point of a binary kernel compiled from C is not
necessarily the first function in the C file. Try this:
copy con hello.c (cat >hello.c for Linux)
#include <stdio.h>
int main(void) { printf("hello"); return 0; }
^Z (^D for Linux)
gcc -c -O2 -g hello.c
objdump --disassemble hello.o
00000000 <.text>:
hello 0: 68 65 6c 6c 6f push $0x6f6c6c65
\0 5: 00 89 f6 55 89 e5 add %cl,0xe58955f6(%ecx)
00000008 <_main>:
The first thing in the .text section is not main(), but the
string 'hello'. You could put main() in a file all by itself:
int main(void) { return real_main_in_another_file(); }
or put a dummy main() with no local variables and no literals
at the beginning of the file:
int real_main(int arg_c, char *arg_v[]);
int main(int arg_c, char *arg_v[])
{ return real_main(arg_c, arg_v); }
/* ... */
int real_main(int arg_c, char *arg_v[])
{ /* ... */ }
or compile with -fwritable-strings:
gcc -fwritable-strings ...
-fwritable-strings is deprecated in GCC 3.4
A related problem: COFF relocatable (.o) files do not have an a.out header,
so the entry point can not be specified. You can assume the entry point
is the start of the .text section, but observe the precautions shown here
if the .o file is built only from C code.
LGDT and LIDT require LINEAR addresses
Normal segment-based address translation does not apply to the GDT
address in the 'pseudo-descriptor' loaded by the LGDT instruction
(same for IDT and LIDT). You must perform the equivalent address
translation yourself:
...
; now in real mode
xor ebx,ebx
mov bx,ds
shl ebx,4 ; DS * 16
lea eax,[gdt + ebx] ; "fixup"
mov [gdt_ptr + 2],eax
lgdt [gdt_ptr]
...
gdt:
; ...your GDT here...
gdt_end:
gdt_ptr:
dw gdt_end - gdt - 1
dd gdt ; this must be LINEAR GDT address
Video output doesn't work
Perhaps you are accessing video memory like this:
*(unsigned char *)0xB8000 = 'A';
This works only if the base address of the kernel data segment is 0.
If your OS does not meet this requirement, you can define a
separate protected-mode segment descriptor with base address 0,
then use far pointer functions to access video memory. Assuming
LINEAR_SEL is the selector for the zero-base segment, then:
/* _farpokeb() is completely defined (prototype AND body) in DJGPP sys/farptr.h */
#include <sys/farptr.h>
...
_farpokeb(LINEAR_SEL, 0xB8000, 'A');
Or, you can use near pointers if you subtract the base address
of the kernel data segment (virt_to_phys):
*(unsigned char *)(0xB8000 - virt_to_phys) = 'A';
For near pointers to work, the kernel data segment must have no limit
(i.e. limit = 4 Gig - 1 = 0xFFFFFFFF).
Use the BIOS to get an accurate accounting of memory
CMOS will not report more than 63.999 meg (65535/1024) of extended memory,
and it won't report if there are 'holes' in extended memory.
As for direct probing of memory size,
it has problems:
- Address 'aliasing', e.g. addresses above 64 meg 'wrap around'
to address 0.
- Memory mapped hardware that causes the computer to freeze
when your memory probe steps on it.
- PC might have a bizarre and unique method of turning on the
A20 gate; a method not supported by your direct probing code.
- Failure of C code to use volatile where necessary,
or buggy compiler that produces faulty code even with volatile.
- Bus float may cause the memory probe to succeed even if
there is no memory.
Trouble with A20
There is no single method of controlling A20 that works on all
PCs (HIMEM.SYS supports 17 different methods). Therefore:
- USE GRUB to boot your OS
- Don't turn on A20 if you don't need to (e.g. if your pmode
kernel runs in conventional memory).
- If HIMEM.SYS is loaded, use its XMS services to control A20.
- If you want to copy something to extended memory, use INT 15h
AH=87h. It will control A20 automatically.
- Try to use INT 15h AH=89h to enter pmode. It will control A20
automatically.
- Before trying to turn on the A20 gate, check if it's already on
- If you use your own code to enable A20, verify that A20 is on
after you enable it.
- If you use your own code to enable A20, your code should
try several methods of enabling A20.
Only one interrupt from keyboard
You won't get more than one interrupt from the hardware devices
unless you reset or clear the interrupt at the end of your
interrupt service routine (ISR). For all devices, you must
clear the interrupt at the 8259 interrupt controller chip.
For IRQs 0-7:
outportb(0x20, 0x20);
For IRQs 8-15:
outportb(0xA0, 0x20);
outportb(0x20, 0x20);
You must also clear the interrupt at the device that caused it.
This is usually done by reading an I/O register.
Timer: | (nothing; you need only clear timer interrupts at the 8259 chip) |
Keyboard: | read scancode byte from I/O port 0x60 |
Realtime clock: | outportb(0x70, 0x0C); (void)inportb(0x71); |
IDE disk: | read status byte from I/O port 0x1F7 |
Re-entrancy problems with interrupt handlers
Don't use printf() in a top-half interrupt handler! printf() and
many other functions are not re-entrant. You should probably avoid
floating-point math in interrupt handlers, for the same reason.
Mixing 16- and 32-bit code
Use aout, .obj (OMF) or other file format that supports this:
nasm -f elf x.asm
x.asm:30: ELF format does not support non-32-bit relocations
The 16-bit objects must be below 64K (0x10000). Otherwise:
ld -s -oformat binary -Ttext=0x10000 -ox.bin x.o y.o
x.o(.text+0x13): relocation truncated to fit: 16 text
Lastly, the linker must support the object file format used:
ld-elf -o test test.o
test.o: file not recognized: File format not recognized
If you can't meet these conditions, then the 16- and
32-bit code must go into separate files.
Don't forget to zero the kernel BSS
Global and static local variables that are not assigned an initial
value when declared are stored in the uninitialized data segment
(BSS). Either the bootloader or the kernel startup code must zero
the BSS.
16-bit DPMI problems with Turbo or Borland C for DOS
Borland C++ for DOS (version 3.1 or newer) and Turbo C++ for DOS
(version 3.0 or newer, not the free version 1.0) use
16-bit DPMI. This conflicts with DJGPP, which uses 32-bit DPMI.
If you mix Borland and DJGPP tools, you get strange error messages:
Using DJGPP MAKE (32-bit DPMI) to invoke Turbo C 3.0 (16-bit DPMI)
from plain DOS:
c:\tc\bin\tcc.exe -v -mt -w -O2 -d -Z -1 -D__STARTUP_ASM__=1 -c -oboot.obj boot.c
16-bit DPMI unsupported.
make.exe: *** [tboot.exe] Error 1
Using DJGPP MAKE to invoke Turbo C 3.0 from Windows DOS box
(note the absence of error message text):
c:\tc\bin\tcc.exe -v -mt -w -O2 -d -Z -1 -D__STARTUP_ASM__=1 -c -oboot.obj boot.c
make.exe: *** [tboot.exe] Error 234
Using Turbo C 3.0 MAKE to invoke DJGPP from plain DOS:
gcc -c boot.c
Load error: no DPMI - Get csdpmi*b.zip
** error 110 ** deleting all
Using Turbo C 3.0 MAKE to invoke DJGPP from Windows DOS box:
gcc -c boot.c
Load error: can't switch mode
** error 106 ** deleting all
Fix your PATH. In a pinch, you can also use Borland MAKER.EXE to
invoke DJGPP tools. MAKER.EXE runs in real mode, instead of 16-bit pmode.
Trouble linking C to asm, or C++ to C, or C++ to asm
This is either caused by C++
name-mangling or by
leading underscores
Asm labels with the same name as an instruction
To avoid this problem with NASM, prepend a dollar sign ($) to the
label. This does not add $ to the label, it merely tells NASM
'this is a label, not a reserved word':
GLOBAL $cli
$cli:
cli
ret
(Thanks to Julian Hall for this tip.)
objcopy -O binary ... produces garbage
Be sure to remove the file sections you don't want:
# -g strips debug sections (.stabs, .stabstr)
objcopy -g -O binary -R .note -R .comment krnl.elf krnl.bin
Also, MinGW objcopy 2.9.4 is known to be buggy. Try a newer version.
Trouble installing bootsector with RAWRITE
RAWRITE for DOS writes anywhere from 3 sectors to one entire track
at a time. If you try to use it to install a bootsector on a FAT12
floppy, it will overwrite the first FAT. (I don't know if the
Windows version of RawWrite works any better.)
Turbo C .EXE files are inordinately large
Compiling (tcc -v ...) or linking (tlink /v ...) with
the debug option apparently implies TLINK /v /i ...
The /i option puts a zeroed BSS in the .EXE file. Normally, only the
BSS size is stored in the .EXE file header, and BSS memory is allocated
when the .EXE file is loaded by DOS. Even if TDSTRIPped, the .EXE
file will be larger than if you compiled and linked without debug info.
'fixed or forbidden register ... was spilled'
'can't find a register in class `[AREG|BREG|CREG|DREG]' while reloading `asm'
New versions of the GNU assembler are pickier about the clobber lists used
in inline asm. Though it worked fine with older versions of the GNU
assembler, the following code is now considered incorrect:
static inline void
memset(void *__dest, unsigned int __fill, unsigned int __size) {
__asm__ __volatile__ ("cld
rep
stosb" :
/* no outputs */ :
"c" (__size),
"a" (__fill),
"D" (__dest) :
"ecx","eax","edi","memory");
}
because registers ECX, EAX, and EDI are present in both the clobber list
and the input constraints. Remove these registers from the clobber list:
...
"a" (__fill),
"D" (__dest) :
"memory");
}
and the code should assemble without error.
Don't name your Linux program 'test'
test
is a command built-in to the Linux shell (bash).
If you compile a small program named test
and try to run it,
the built-in test
will run instead, and it will appear to
do nothing.
IRET will cause a TSS-based task-switch if EFLAGS.NT is set
GRUB version 0.90 leaves this bit set. The kernel startup code should
probably do something like this:
push 2
popf
before enabling interrupts, starting multitasking, or otherwise using IRET.
IRET to Ring 3 doesn't save Ring 0 stack pointer
If an exception switches the processor from Ring 3 (user privilege) to
Ring 0 (kernel privilege), the Ring 0 stack pointer will automatically
be loaded from the TSS. However, the reverse is not true: before using
IRET to return from Ring 0 to Ring 3, you must save the Ring 0
stack pointer in the TSS:
; None of this code is pre-emptible. I assume that it runs
; with interrupts disabled (i.e. it's called via interrupt gates)
isr00: ; DIVIDE ERROR
...
isr0D: ; GPF
nop ; From Ring 3, it pushes 6 dwords:
nop ; SS, ESP, EFLAGS, CS, IP, error code
push byte 0Dh ; push exception number (+1 dword)
jmp all_ints
isr0E: ; PAGE FAULT
...
all_ints:
push gs ; push segment registers (+4 dwords)
push fs
push es
push ds
pusha ; push GP registers (+8 dwords)
mov ax,SYS_DATA_SEL
mov ds,eax ; put known-good values in seg regs
mov es,eax
mov fs,eax
mov gs,eax
push esp ; push pointer to stacked regs_t
call fault ; call C language handler
pop eax ; drop pointer to stacked regs_t
lea eax,[esp + 76] ; 19 dwords == 76 bytes
mov [tss_esp0],eax ; Ring 0 ESP value after IRET
popa ; pop GP registers (-8 dwords)
pop ds ; pop segment registers (-4 dwords)
pop es
pop fs
pop gs
add esp,8 ; drop exception number and
; error code (-2 dwords)
iret ; IRET pops IP, CS, EFLAGS, ESP, SS
; (-5 dwords)
PC crashes or freezes when returning to real mode
The procedure to do this given in section 14.5 of 386INTEL.TXT
is not complete or accurate. Try this instead:
- Disable interrupts
- If paging is enabled:
- Jump to a region of memory that is identity-mapped
- Clear the PG bit in register CR0
- Write 0 to register CR3 to flush the TLB (page table cache)
- Jump to a segment with limit 64K (FFFFh). CS must have this limit
before you return to real mode.
- Load SS with a selector that points to a descriptor that is
appropriate for real mode:
Limit = 64K (FFFFh) Byte-granular (G=0)
Expand-up (E=0) Writable (W=1)
Present (P=1) Base address = any value
SS must have limit 64K (FFFFh) before you return to real mode. You may
leave the other data segment registers with limits >64K if you want
'unreal mode', otherwise load them with a similar selector.
- Clear the PE bit in register CR0
- Jump to a 16:16 real-mode far address
- Load all other segment registers (SS, DS, ES, FS, GS)
- Use the LIDT instruction to load an IDT appropriate for real mode,
with base address = 0 and limit = 3FFh. Use the 32-bit operand size
override prefix so all 32 bits of the IDT base are set (otherwise,
only the bottom 24 bits will be set).
o32 lidt [real_idt]
...
real_idt:
dw 1023
dd 0
- Zero the high 16 bits of 32-bit registers. If the register value is not
important, just zero the entire 32-bit register, otherwise use 'movzx':
xor eax,eax
...
movzx ebp,bp
movzx esp,sp
- Enable interrupts
Differences between this routine and that given in 386INTEL.TXT:
- Disabling interrupts should be the FIRST thing you do.
- Enabling interrupts should be the LAST thing you do.
- Can leave limit >64K for segment registers other than CS and SS
(for 'unreal' mode)
- Must use o32 prefix with LIDT instruction in 16-bit code segment
- Top 16 bits of registers must be zeroed before returning to
DOS/real mode
Build the IDT at run-time
Look at the layout of a 32-bit interrupt gate:
Lowest byte | Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5
| Byte 6 | Highest byte
|
Offset 7:0 | Offset 15:8 | Selector 7:0 | Selector 15:8
| Word Count 4:0 | Access | Offset 23:16 | Offset 31:24
|
The 32-bit Offset is split into two 16-bit halves, with the other four
bytes of the gate between the two halves. Because of this, it's nearly
impossible to create a non-trivial IDT at compile-time (or assemble-time).
Another reason to build the IDT at run-time is to put the first 7 entries
in a non-cachable page of memory. This is a work-around for the
Pentium F00F bug, devised by Robert Collins.
Stack problems using INT 15h AX=1687h (DPMI) to enter pmode
Suppose you make an .EXE stub meant to convert a 32-bit executable
file (DJGPP COFF, Win32 PE COFF, ELF, etc.) into a 32-bit DOS
executable. But your 32-bit code crashes when you try to use the
stack. Why?
If the .EXE stub starts out with SP=0, this will be equal to ESP=0
after entering 32-bit pmode. The first PUSH will decrement this to
ESP=0FFFFFFFCh. It's unlikely that such a large address is valid in a
DPMI environment. Certainly, the stack is no longer where you expect
it to be.
The .EXE stub should set SP=0FFFCh. The switch to pmode will zero-extend
this to ESP=0000FFFCh. Either that, or allocate a completely new stack
for the 32-bit code, instead of re-using the 16-bit DOS stack.
Don't put DJGPP COFF and a.out files in the same archive (.a) file
From a message on
Dark Fiber's OS message board:
The problem, which the DOCs for Djgpp failed to mention, is that AOUT AND
COFF files should NOT be in the
same library file!!! LD can't Handle such a library file ...
Declare linker script symbols as external char arrays
If you define symbols in a linker script (e.g. g_code, _edata),
these should be prototyped in C as an external character array:
extern char g_code[], _edata[];
You can also use unsigned char if you want. Because the following
may not do what you want, they should be avoided:
extern char *g_code; /* doesn't work */
extern int g_code[]; /* sizeof(int) != 1 */
GCC for BeOS makes position-independent code by default
Without command-line options to the contrary, GCC for BeOS runs as though
you typed gcc -fPIC ...
If you are building a kernel, turn off
position-independent code with gcc -fno-PIC ...
'ld: krnl.x: Not enough room for program headers, try linking with -N'
If you're linking an ELF kernel with different physical and virtual addresses
(e.g. 0x100000 physical, 0xC0000000 virtual), this error may occur if
you don't use AT() in the linker script, or if the linker script is buggy.
Can't make Multiboot kernel with ILINK32
ILINK32 is the linker that comes with Borland C++ 5.5. It is impossible
(literally) to make a Multiboot-compatible kernel with this linker.
Explanation:
- ILINK32 can't make an ELF file, so you must use the aout kludge
- For the aout kludge to work properly, the memory alignment and file
alignment must be equal
- If memory alignment = file alignment, the only value that ILINK32
will accept is 4096, i.e.
ilink32 -Ao:0x1000 -Af:0x1000 ...
- With these alignment settings, .text starts at file offset 8192.
However, the Multiboot header must be within the first 8192 bytes
of the executable file.
A20 must be enabled to reboot using the keyboard controller
When the RESET signal is asserted on a 32-bit x86 CPU, the CPU begins
execution at address 0xFFFFFFF0. This address is within the motherboard
ROM BIOS, which is at 0xFFFF0000 (and at 0xF0000). If the A20 gate is
disabled, address 0xFFFFFFF0 becomes 0xFFeFFFF0, and the CPU fetches
code from non-existent memory after reset.
A20 is normally enabled in protected-mode operating systems, but keep
this in mind if you write a real-mode OS.