Re: [PATCH] RCU for low latency (experimental)

From: Dipankar Sarma
Date: Wed Mar 24 2004 - 18:14:56 EST


On Wed, Mar 24, 2004 at 11:53:26PM +0100, Andrea Arcangeli wrote:
> On Thu, Mar 25, 2004 at 03:09:15AM +0530, Dipankar Sarma wrote:
> >
> > The difference here is that during the callbacks in the kernel thread,
> > I don't disable softirqs unlike ksoftirqd thus giving it more opportunity for
> > preemption.
>
> preemption where? in irq? softirqd will preempt just fine.
>
> Also not that while we process the callbacks in softirqd, the irqs will
> stop running the callbacks, just like in your scenario. the callbacks
> and the other softirqs will be processed from user context, and it won't
> be different from scheduling krcud first and ksoftirqd later.

It is different because the heart of ksoftirqd - do_softirq() is
run with local_bh_disable(). Sure, you get chances to preempt
there, but not as much as krcud which runs completely in process
context. Of course, it depends on how long we stay with
softirqs disabled in do_softirq().

> > I had already been testing this for the DoS issue, but I tried it
> > with amlat on a 2.4 GHz (UP) P4 xeon with 256MB ram and dbench 32 in a
> > loop. Here are the results (CONFIG_PREEMPT = y) -
> >
> > 2.6.0 vanilla - 711 microseconds
> > 2.6.0 + throttle-rcu - 439 microseconds
> > 2.6.0 + rcu-low-lat - 413 microseconds
> >
> > So under dbench workload atleast, throttling RCU works just as
> > good as offloading them to a kernel thread (krcud) as rcu-low-lat
> > does.
>
> very nice.

Yes. BTW, throttle-rcu too can't avoid route cache overflow under
DoS stress test. Different problem, but I just wanted to mention
this. I will publish those results separately.


> > I used the following throttle-rcu patch with rcupdate.rcumaxbatch
> > set to 16 and rcupdate.rcuplugticks set to 0. That is essentially
> > equvalent to Andrea's earler suggestion. I have so many knobs in
> > the patch because I had written it earlier for other experiments.
> > Anyway, if throttling works as well as this in terms of latency
> > for other workloads as well and there isn't any OOM situations,
> > it is preferable to creating another per-cpu thread.
>
> I don't think this is enough (it is enough for the above workload
> though, maybe for the dcache too, but it's not generic enough), 16
> callbacks per tick too low frequency, it may not keep up, that's why I

That was not 16 callbacks per tick, it was 16 callbacks in one
batch of a single softirq. And then I reschedule the RCU tasklet
to process the rest. I am planning to vary this and see if we
should do even less per softirq.

> But I certainly agree this is a nice solution in practice (at least with
> all the common usages).

Needs more testing with other workloads. I will have some more results
with the tiny files test.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/