Hello!
the context-switch argument i'll believe if i see numbers. You'll probably need in excess of tens of thousands of irqs/sec to even be able to measure its overhead. (workqueues are driven by nice kernel threads so there's no TLB overhead, etc.)
It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)
According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.
Anyway, all the uses of tasklet should be verified:
The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)
Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.
The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.
Tasklet is single thread by definition and purpose. Those a few places
the only remaining argument is latency:
You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.