Re: CONFIG_PREEMPT and server workloads

From: Andrew Morton
Date: Thu Mar 18 2004 - 12:40:26 EST


Andrea Arcangeli <andrea@xxxxxxx> wrote:
>
> > > Worst of all we're now taking spinlocks earlier than needed,
> >
> > Where? CPU scheduler?
>
> Everywhere, see the kmaps, we spinlock before instead of spinlock after,
> the scheduler, lots of places. I mean, people don't call
>
> preempt_disable()
> kmap_atomic
> spin_lock
>
> they do:
>
> spin_lock
> kmap_atomic

They do? kmap_atomic() disables preemption anyway.

> so they're effectively optimizing for PREEMPT=y and I don't think this
> is optimal for the long term. One can aruge the microscalability
> slowdown isn't something to worry about, I certainly don't worry about
> it too much either, it's more a bad coding habit to spinlock earlier
> than needed to avoid preempt_disable.
>
> > > and the preempt_count stuff isn't optmized away by PREEMPT=n,
> >
> > It should be. If you see somewhere where it isn't, please tell us.
>
> the counter is definitely not optimized away, see:
>
> #define inc_preempt_count() \
> do { \
> preempt_count()++; \
> } while (0)
>
> #define dec_preempt_count() \
> do { \
> preempt_count()--; \
> } while (0)
>
> #define preempt_count() (current_thread_info()->preempt_count)
>
> those are running regardless of PREEMPT=n.

The macros are needed for kmap_atomic().

The task->preempt field is also used for tracking the hardirq and softirq
depths (in_interrupt(), in_irq(), in_softirq()) so it cannot be removed
with CONFIG_PREEMPT=n.

> > We unconditionally bump the preempt_count in kmap_atomic() so that we can
> > use atomic kmaps in read() and write(). This is why four concurrent
> > write(fd, 1, buf) processes on 4-way is 8x faster than on 2.4 kernels.
>
> sorry, why should the atomic kmaps read the preempt_count? Are those ++
> -- useful for anything more than debugging PREEMPT=y on a kernel
> compiled with PREEMPT=n? I thought it was just debugging code with
> PREEMPT=n.
>
> I know why the atomic kmaps speedup write but I don't see how can
> preempt_count help there when PREEMPT=n, the atomic kmaps are purerly
> per-cpu and one can't schedule anyways while taking those kmaps (no
> matter if inc_preempt_count or not).

We run inc_prempt_count() in kmap_atomic() so that we can perform
copy_*_user() while holding an atomic kmap in read() and write().
do_page_fault() will detect the elevated preempt count and will return a
short copy.

This could be implemented with a separate task_struct field but given that
preempt_count is always there and the infrastructure exists anyway, using
preempt_count() is tidier.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/