Re: top stack (l)users for 2.5.69

From: David Howells (dhowells@warthog.cambridge.redhat.com)
Date: Thu May 08 2003 - 05:29:58 EST


> The kernel stack, at least for ix86, is only one, set upon startup
> at 8192 bytes above a label called init_task_unit. The kernel must
> have a separate stack and, contrary to what I've been reading on
> this list, it can't have more kernel stacks than CPUs and, I don't
> see a separate stack allocated for different CPUs.

Not so... Look into asm-i386/thread_info.h on 2.5 kernels. Each process/thread
has a chunk of memory of THREAD_SIZE size (8K on i386) with a small structure
(struct thread_info) lurking at the bottom and its own personal kernel stack
lurking at the top.

The number of CPUs doesn't have anything to do with it.

Unless you mean "kernel stack pointer"?

> The context of a task (see entry.S) is completely defined by
> its registers, including the hidden part of the segments
> (selectors) that define priviledge.

No, it's not. It's also defined by, for instance, all the waitqueues it's on,
and these bits of information are frequently stored on the kernel stack, and
all the internal function information in the call chain leading to whoever
called schedule().

> But no! Not at all. The context of a user does not need to be saved
> on the stack, and in fact, isn't. It's saved in a task structure
> that was created when the original task was born. The pointer to
> that task structure is called 'current' in the kernel. It's in
> the kernel's data space, and everything necessary to put that
> task back together is in that structure.
>
> Context switching is usually not done by pushing all the registers
> onto a stack, then later popping them back. That's not the way
> it works.

Yes it is. The context saved on the kernel stack belonging to the process that
was executing at the time. entry.S uses SAVE_ALL to build a stack frame
including all the values of all the data registers that were in use at the
time. The pt_regs structure defines the layout of this.

The pt_regs that holds the userspace context for any particular process is
pointed to by current->thread.esp0 or its equivalent on other archs (look at
arch/i386/ptrace.c).

What you get in the thread info for any process is:

        +------------------+ <--- highest addr
        | PT_REGS (userspace)
        +------------------+
        | function call chain
        +------------------+
        | PT_REGS (kernel) <--- MMU exception maybe
        +------------------+
        | function call chain
        +------------------+
        | PT_REGS (kernel) <--- timer interrupt maybe
        +------------------+
        | function call chain
        +------------------+ <--- addr in kernel stack pointer
        |
        :
        |
        +------------------+ <--- limit of stack
        | THREAD_INFO
        +------------------+ <--- lowest addr

The lowest address always resides on an 8Kb boundary, and so the address of
THREAD_INFO can be found by ESP&~8191.

That said, some registers are saved in current->thread, but only the minimum
possible. This is done by switch_to() which is called from the scheduler.

Furthermore, interrupt handlers aren't allowed to call schedule (the scheduler
barfs on them if they do), so stacks will only be switched if the top stack
frame belongs to the process and not to an interrupt.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu May 15 2003 - 22:00:27 EST