Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-6
From: Ingo Molnar
Date: Thu Dec 09 2004 - 04:33:59 EST
* Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Wed, 2004-12-08 at 21:39 +0000, Rui Nuno Capela wrote:
> >
> > Almost there, perhaps not...
> >
> > It doesn't solve the problem completely, if not at all. What was kind of a
> > deterministic failure now seems probabilistic: the fault still occur on
> > unplugging the usb-storage stick, but not everytime as before.
> >
>
> OK, so I would say that this is part of a fix, but there are others.
> There are lots of changes done to the slab.c file by Ingo. The change I
> made (and that is just a quick patch, it needs real work), was only in a
> place that was obvious that there could be problems.
>
> Are you running an SMP machine? If so, than the patch I gave you is
> definitely not enough.
one of Rui's boxes is an SMP system - which would explain why the bug
goes from an 'always crash' to 'spurious crash'. (if Rui's laptop
triggers this problem too then there must be something else going on as
well.)
> Ingo really scares me with all the removing of local_irq_disables in
> the rt mode. I'm not sure exactly what is going on there, and why they
> can, or should be removed. Ingo?
it is done so that the SLAB code can be fully preempted too. The SLAB
code is of central importance to the -RT project, if it's not fully
preemptible then that has a ripple effect on other subsystems (timer,
signal code, file handling, etc.).
So while making it fully preemptible was quite challenging (==dangerous,
scary), i couldnt just keep the SLAB using raw spinlocks, due to the
locking dependencies. (nor did i have any true inner desire to keep it
non-preemptible - the point of PREEMPT_RT is to have everything
preemptible. I want to see how much preemption the Linux kernel can take
=B-) It has held up surprisingly well i have to say.)
to make the SLAB code fully preemptible, there were two main aspects
that i had to fix:
1) irq context execution
2) process preemption
in the -RT kernel all IRQ contexts execute in a separate process
context, so the SLAB code is never called from a true IRQ context -
hence problem #1 is solved. As far as #1 is concerned, the
local_irq_disable()s are not needed anymore.
the other aspect is process<->process preemption - which can still occur
in the -RT kernel (and is the whole point of the PREEMPT_RT feature).
This means that the per-CPU assumptions within slab.c break.
To solve this i've turned the unlocked per-CPU SLAB code to be
controlled by the cachep->spinlock. (on RT only - on non-RT kernels the
SLAB code should be largely unmodified - this is why all that _rt and
_nort API trickery is done.) Since the SLAB code is thus locked by
cachep->spinlock on PREEMPT_RT, other tasks cannot interfere with the
internal data structures.
Finally, there was still the problem of the use of smp_processor_id() -
the non-RT SLAB code (rightfully) assumes that smp_processor_id() is
constant, but this is not true for the RT code - which can be preempted
anytime (still holding the spinlock of course) and can be migrated to
another CPU.
To solve this problem i am saving smp_processor_id() once, before we use
any per-CPU data structure for the first time, and this constant CPU ID
value is cached and used throughout the whole SLAB processing pass.
[ Since in the RT case we lock the cachep exclusively, it's not a
problem if the 'old' CPU's ID is used as an index - as long as the index
is consistent. Most of the time the current CPU's ID will be used so we
preserve most of the performance advantages (==cache-hotness) of per-CPU
SLABs on SMP systems too. (except for the locking, which is serialized
on RT.) ]
SLAB draining was an oversight - it's mainly called when there is VM
pressure (which is not a stricly necessary feature, so i disabled it),
but i forgot about the module-unload case where it's a correctness
feature. Your patch is a good starting point, i'll try to fix it on SMP
too.
Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/