Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP

From: Ingo Molnar
Date: Thu Nov 30 2006 - 05:33:04 EST

Next message: Mariusz Kozlowski: "Re: [PATCH] mv643xx add missing brackets"
Previous message: Andreas Schwab: "Re: [patch 1/3] mm: pagecache write deadlocks zerolength fix"
In reply to: Evgeniy Polyakov: "Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP"
Next in thread: Wenji Wu: "RE: [patch 1/4] - Potential performance bottleneck for Linxu TCP"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Evgeniy Polyakov <johnpol@xxxxxxxxxxx> wrote:

> > David's line of thinking for a solution sounds better to me. This
> > patch does not prevent the process from being preempted (for
> > potentially a long time), by any means.
>
> It steals timeslices from other processes to complete tcp_recvmsg()
> task, and only when it does it for too long, it will be preempted.
> Processing backlog queue on behalf of need_resched() will break
> fairness too - processing itself can take a lot of time, so process
> can be scheduled away in that part too.

correct - it's just the wrong thing to do. The '10% performance win'
that was measured was against _9 other tasks who contended for the same
CPU resource_. I.e. it's /not/ an absolute 'performance win' AFAICS,
it's a simple shift in CPU cycles away from the other 9 tasks and
towards the task that does TCP receive.

Note that even without the change the TCP receiving task is already
getting a disproportionate share of cycles due to softirq processing!
Under a load of 10.0 it went from 500 mbits to 74 mbits, while the
'fair' share would be 50 mbits. So the TCP receiver /already/ has an
unfair advantage. The patch only deepends that unfairness.

The solution is really simple and needs no kernel change at all: if you
want the TCP receiver to get a larger share of timeslices then either
renice it to -20 or renice the other tasks to +19.

The other disadvantage, even ignoring that it's the wrong thing to do,
is the crudeness of preempt_disable() that i mentioned in the other
post:

---------->

independently of the issue at hand, in general the explicit use of
preempt_disable() in non-infrastructure code is quite a heavy tool. Its
effects are heavy and global: it disables /all/ preemption (even on
PREEMPT_RT). Furthermore, when preempt_disable() is used for per-CPU
data structures then [unlike for example to a spin-lock] the connection
between the 'data' and the 'lock' is not explicit - causing all kinds of
grief when trying to convert such code to a different preemption model.
(such as PREEMPT_RT :-)

So my plan is to remove all "open-coded" use of preempt_disable() [and
raw use of local_irq_save/restore] from the kernel and replace it with
some facility that connects data and lock. (Note that this will not
result in any actual changes on the instruction level because internally
every such facility still maps to preempt_disable() on non-PREEMPT_RT
kernels, so on non-PREEMPT_RT kernels such code will still be the same
as before.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mariusz Kozlowski: "Re: [PATCH] mv643xx add missing brackets"
Previous message: Andreas Schwab: "Re: [patch 1/3] mm: pagecache write deadlocks zerolength fix"
In reply to: Evgeniy Polyakov: "Re: [patch 1/4] - Potential performance bottleneck for Linxu TCP"
Next in thread: Wenji Wu: "RE: [patch 1/4] - Potential performance bottleneck for Linxu TCP"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]