Re: [patch] smptimers, old BH removal, tq-cleanup, 2.5.39

From: Jeff Garzik (jgarzik@pobox.com)
Date: Sun Sep 29 2002 - 13:38:50 EST


Ingo Molnar wrote:
> Basically with the removal of TIMER_BH i think the time is right to get
> rid of old BHs forever, and to do a massive cleanup of all related fields.
> The following five basic 'execution context' abstractions are supported by
> the kernel:
>
> - hardirq
> - softirq
> - tasklet
> - keventd-driven task-queues
> - process contexts

To be pedantic (and perhaps more clear), I think keventd can really be
considered with process context, because that's schedule_task()'s main
purpose - to provide a process context for code that otherwise may have
not one reliably.

And to tangent a bit, there are three improvements in this area that
(IMO) are worth considering:

1) keventd's worth is diminished, I think, due to what I see as an open
question about its usage: it provides a process context, and that's why
most code uses it. And the main reason why people want process context
is to potentially sleep. So by extension, one really doesn't know how
long a schedule_task() will wait before it is issued.

At least in driver code I see common patterns -- especially error
handling paths -- which could benefit from some code in interrupt
context saying "run my error handling path in process context." This is
a one-shot task that's perfect for keventd... if it weren't for the fact
that worst case, you don't really know if your error handling path will
even get called in the next 5 minutes.

I have a suggested solution, but it seems heavyweight to me and easily
shoot-down-able: have a keventd manager that manages the keventd queue.
  a second thread actually does the work of providing a process context.
  third, fourth, etc. threads are created if work starts piling up
because earlier work threads are taking a long time to complete. Cap
the number of threads at a maximum, and kill off idle threads [i.e.
maintain a thread pool].

But in any case, I think keventd has a lot of utility, and is very
under-utilized at present.

2) Given the above, I think a better driver interface is to simply pass
in a function pointer, and a user data pointer. No need for any special
task queue structure. Drivers call
        schedule_task(my_func, my_context_specific_data);
and the driver's callback looks like
        void my_func(void *data)

3) Maybe this is an improvement worth considering, maybe not: I think
it would be useful to have "process context timers." What I mean by
this is, make it easier to implement code that can be boiled down to:
        run in process context
        sleep for a little while
Sure I can do this by implementing a timer that does nothing more than
call schedule_task(), but that seems a little redundant if done in
multiple places :)

Ethernet link state machines, pcmcia-16's multi-stage state machine,
etc. are examples of this. Ethernet link state machines are often
handled via the standard timers, and I would prefer to be able to sleep
and do process-context-y stuff instead. Talking to ethernet phys often
takes quite a while...

I think these improvements would also serve to encourage coders to use
process context whenever possible (not only is it the most flexible,
it's easy too!), which is always a good thing.

Comments anyone?

> i believe task-queues should not be removed from the kernel altogether.
> It's true that they were written as a candidate replacement for BHs
> originally, but they do make sense in a different way: it's perhaps the
> easiest interface to do deferred processing from IRQ context, in
> performance-uncritical code areas. They are easier to use than tasklets.

I have a theory that most if not all "task queues" can be boiled down
into keventd-tasks, which allows for further [if minor] simplifications
as the function calling simplification in #2 above.

> i have moved all the taskqueue handling code into kernel/context.c, and
> only kept the basic 'queue a task' definitions in include/linux/tqueue.h.
> I've converted three of the most commonly used BH_IMMEDIATE users:
> tty_io.c, floppy.c and random.c. [random.c might need more thought
> though.]

makes sense

> scalable timers: i've further improved the patch ported to 2.5 by wli and
> Dipankar. There is only one pending issue i can see, the question of
> whether to migrate timers in mod_timer() or not. I'm quite convinced that
> they should be migrated, but i might be wrong. It's a 10 lines change to
> switch between migrating and non-migrating timers, we can do performance
> tests later on.

As an aside, most drivers seem the use mod_timer()

Regards,

        Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Sep 30 2002 - 22:00:41 EST